Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - tria

Pages: [1] 2

What is a good microphone for speech recognition? / Andrea Array-2S

« on: April 05, 2012, 10:48:02 AM »

Hello,

I purchased Andrea Array-2S for around 40 USD, it is a microphone array (stereo microphones), and I believe it has good performance for a hand/head free mic. I tried placing it away from me (around 1.5 meters), and was able to catch my voice and have it recognized at around 75-92% accuracy in VC. The mic comes with an external audio card that can be connected via USB, and has a software along with it (you need to download it). The software allows you to tweak the different settings, like the volume control, noise cancellation, echo cancellation, and a way to direct the mic in a certain range/direction to only focus on that range/direction. I must say that you have to enable/disable and play with the volume in order to find the best result. For example, if I moved the mic closer without changing the configs, I end up with 0% recognition (or wrong recognition). But if I modify them again, I can reach a better recognition. Also the placement/direction affect it as well.

As a regular mic, I must admit it is a great one at getting a clear voice, and is suitable for VoIP calls and such even if I move it away to longer distances. As a mic for voice recognition, it do a fairly good job, but sometimes it is not accurate, and need lots of adjustments. The only problem is that it cannot differentiate between the audio coming from the speakers and my voice when I am watching movies with high volume. But all in all, I think it is suitable for voice recognition and VC.

Just wanted to share my experience in the hope that it might be of help to others.

General Discussion / Re: Possible ways for Arabic support

« on: March 02, 2012, 04:45:51 PM »

Oh I see. Then from what I saw you are not bad at it. I met some people while there who are like you (some of them were good at at, some of them were just okay). What are the chances of you being one of them haha. Arabic is not an easy language mainly due to many factors, but you probably already know that.

I am working on a home automation plugin for xbmc, where you can have house plan, and can toggle switches/devices on and off in a visual manner. I intend to release it once I finish it.

Feature Requests / Re: Group/Action confiedence rate + Command flow

« on: March 01, 2012, 03:03:42 PM »

wget use -- for long params, and - for short ones (confusing). You should've used the one I sent (changing the rate), but it is good you figured it out.

Sorry for not being clear.
Two examples:
1). Payload with lots of items, but lets say that some of them are:
Name
Name Two
Name Three
....
They have the prefix (Name), but differ in the end part (Two, Three). If I do a command with this payload file, and say a command with the name "Name", it will pick the first one for sure. Is there is a way to get a list of all possible options (Name, Name Two, Name Three), instead of assuming it is the exactly matching one. And how to access each one of them, like when you talked about expanding {LastResult}

2). This time we have two payload files. One for the items, and another for categories. The items file can have one item name appearing more than once (but with different value). Each item belong to a category (from the category file), and this is why there might be two or more files with the same name, but with different category. Is it possible to implement such approach with payloads files?
If so, how do you handle it in VC, so that you can say item A from category B, or item A from category C and get that specific item value to use it in further commands. Or you say item A, and if it is unique (one item), it will return the item value right away. If it is not unique, it will give you the list of possible category, and then decide about the item value to return. It is really hard to explain even if I explained it in my own language

Feature Requests / Re: Group/Action confiedence rate + Command flow

« on: March 01, 2012, 02:29:12 PM »

For (1), Oh! sorry didn't pay attention to that!
For (2), Hmmm... I like the idea of making groups that are disabled, and one action enable them and disable all other groups, and after executing one of the actions toggle them back. It would be difficult to maintain as it grows, but it is still possible. But I welcome supporting context flow, even if that will be in the far future.

One question since you mentioned {LastResult} and the possibility to expand it. Say you have a list of items that have the same prefix (name-wise), and I have them listed in a xml payload file. Is it possible to get the other matching prefixes to present them to the user using further logic controls or enabling/disabling groups?
Similarly, imagine a case where the payload file my contains items with exactly the same names, but different IDs (values), would it be possible to get a list of them. Is it possible to link them to category (another payload file) so that it is easier to identify them. A that belongs to B, or A that belongs to C (where A is the name of the item, and B/C two different categories names).

Feature Requests / Group/Action confiedence rate + Command flow

« on: March 01, 2012, 12:29:21 PM »

Hi,

I have two feature requests, they maybe hard to implement, but would appreciate if they are taken under consideration:

1). Have confidence rate for every Group or Action that will override the global confidence rate when set. This will enable marking extremely critical command groups (shutdown/close/restart...) with high confidence rate, while other commands that requires low confidence rate (like not supported languages) to have a different rate. Any other command will fallback to the global settings.

2). Have a command flow, where you can say one command, and this command allow you to say a specific set of commands (depending on the input for the first command). Examples, you say something, VC replay (using TTS) with given choices, or another question, and you replay with another answer, and so on. Another example, you choose to play a file (in XBMC for example), but the mentioned file is a prefix for more than one file, then you have VC ask whether you want this or that, and you answer back. Of course, the logic handling should be left to the user (using a similar thing to the conditional statement in the action builder).

What do you guys think

Feature Requests / Re: noise-cancellation/post-processing

« on: February 26, 2012, 04:21:24 AM »

Thanks jitterjames. I am not currently thinking about doing noise reduction, but I might in the future.

But your wav-watcher idea got my attention, is there a similar wav-spitter (if you don't mind the name

) where it spit/output the recorded/voice from your application in real time (and is able to call something else with every update to the file). This will be useful for the other utility I'm planing doing. Check your PM please.

Kalle, did you just ripped that from Andrea website

But I agree, it is a great diagram, I knew what "array" was all about after seeing it.

Thanks guys.

General Discussion / Re: How did you discover Vox Commando

« on: February 26, 2012, 03:39:29 AM »

What I meant was to focus on Youtube. I didn't say there are no videos there, as I saw many of them (but only once I found about VC). It is just a suggestion to focus on it.

For example, most of the videos either explain the program itself (as a tutorial), or are videos done by people showing case their setups without mentioning VoxCommando in the description or the comments. There should be use cases of setups for all the supported application by VC to show how the application can be useful and practicle (and not how cool it is). The title shouldn't be related to VC, rather to the functionality of VC (voice commanding..., hand free control..., automation by voice using...), but the description need to mention VC.

There is nothing wrong with what was made, it is just that you should focus on promoting it there based on the functionality to get the attention of interested users.

Feature Requests / Re: noise-cancellation/post-processing

« on: February 25, 2012, 04:28:25 PM »

I totally agree with you jitterjames. But noise reduction using a noise-only profile to subtract from speech with noise is a close approximation to the true cancellation you described (which is by all means better as the noise is real time, and not pre-recorded). I saw lots of software do it, and the results were acceptable (human-ear-wise). I am not saying it is better or worse, or whether you should do it or not. I just think it is a good approximation that work at least for human ear (recording/playback). Will it be better for recognition, I am not so sure now since you say it might be worse, and since I cannot try enabling/disabling it (it is either with the mic or not).

You are right about the level. I tried that before and it increased the recognition a *little* bit. I still think the reason is due to the fact that noise become less hear-able. But I could be wrong.

I see you are using SAPI 5.3, didn't know you relay on such lib (and that you can end up with such great and flexible program). I think I under estimated it. Then do you use .NET, and if so, what language (C++,C#,VB,...), of course if you don't mind me asking.

Oh and no, it is not a bug (issue), it is a feature

But that doesn't mean I expect it being implemented, as I do know how hard to do driver-level hooks (also you will end up with unsigned code). It is just I wish for a hand-free and head/ear-free solution with high accuracy. I am willing to invest in the more expensive ones, but only if they prove to be like the head ones, but from a distance.

Thank you guys for your valuable discussion. I believe all my posts drift away from the topic.

Feature Requests / Re: noise-cancellation/post-processing

« on: February 25, 2012, 10:06:40 AM »

Yes true. But simpler software processing algorithm might introduce improvements.

I have a noise cancelling mic, and it works great. However, I am wishing to use one of the microphone array that have better coverage, but poor noise cancellation. Can you suggest any good microphone array?

Also, can you suggest software (driver-level) to do noise-cancellation. Same with hardware, can you suggest a device that can be used with PC (usb maybe?) that accept any mic and cancel it's noise.

Thanks.

General Discussion / Re: How did you discover Vox Commando

« on: February 25, 2012, 09:07:29 AM »

^^^

Just like Jimmy, I learned about it from Youtube by accident while watching a totally unrelated thing (but was about XBMC). The video was not about VC, but one of the comments mentioned it, which made me google it out of interest.

You should use Youtube more to promote this great product. Some people don't see the need for such thing until they see a use case for it.

Feature Requests / noise-cancellation/post-processing

« on: February 25, 2012, 08:44:09 AM »

Hello,

While testing many mics I realized that what affect the accuracy the most is not the level of the voice, but rather the clearness of the voice. Some of the mics provide great coverage, but have high noise level which make detection impossible even if you are close to the mic.

Wouldn't it be great if VC can do noise cancellation to improve the accuracy. It can be trained to detect the noise for the mic. For example, the user record a sample under quite condition (to determine the noise) for a few seconds, and this will be like a noise profile for the mic. Then this will be used with any further recorded/listened speech to (subtract) the noise from the speech. This while it will not remove all the noise, it will end up with a much clearer voice. Also, another way is to filter low/high freq bands, allow the use of different sample rates and so fourth. You can also make this step use external utility specialized in this kind of processing.

General Discussion / Re: GetRandomP issue

« on: February 24, 2012, 02:38:16 PM »

Just tested the GetRandomFile with PlayWav, and it is way better than my previous approach.

Yes, I guess we are going off topic. I'll finish the utility first, and then think about how to communicate with VC. I'll make sure to contact you by then.

Thanks jitterjames.

General Discussion / Re: GetRandomP issue

« on: February 24, 2012, 11:20:16 AM »

Thanks man, this is really great. I'll make sure test them, and use them instead of the payload method. I'll also try the Bing plugin again.

As for Arabic not working, did you encode the url parameters before you query/send the HTTP request? You see, most Latin-based languages are easy to pass, but other languages requires you to escape the characters (eventually they will look something like %D0%F5%66%45...). This way the server will get them properly and unescape them (if the server support other languages/utf-8 etc). Most current browsers do this transparently without affecting the URL, but in reality they do that once they send the request.

Your Arabic text example in the XML file for googlespeak was really funny, let me translate it back to you: "This is the biggest program any time passed create"

I know, I know, blame the translator

I will see if I can come up with anything, but the problem is that I must use UDP for message passing. Can't you just implement a way to listen for the output stream for another process (mine), and get the binary audio file (for TTS) or as text (for Speech Recognition). For the later I could play the audio on behalf of VC, but for the first it is impossible to pass the audio from VC or the text back to VC without writing a plugin and/or using udp. I'll see what I can do.

Thanks again for the update.

General Discussion / Re: GetRandomP issue

« on: February 23, 2012, 07:22:13 PM »

I really appreciate you looking into my request. You should note that what I meant by "prefix" is "prefix" and not "postfix"

I was meaning a common string that they start with (but differ in the ending). However, your postfix or extension idea is also great.

As for the secret command, I am not sure whether you are serious or not, as the secret command doesn't work even if I enable Bing addon (only speaksync/speak). Unless you are hinting about upcoming function for the thing I explained to you in another place and you are going to call it that secret command?

If that is the case, then I might help with providing a command line utility that automate it if you like. I was intending to create it to use it for other things, but if it help here then why not. Can you call other application and pass parameters to them from VC? also, although not required, can VC read the output stream from such called application (it would be a great way to pass result back)?

General Discussion / Re: GetRandomP issue

« on: February 23, 2012, 04:32:06 PM »

Can I ask you to implement another feature?

Can you make another action like "PlayWav", lets call it "PlayRandomWav", where you specify the directory containing the wav files, and it will play one of them randomly. More ideas could be: allow the user to specify prefix for the files (and maybe several prefixes separated with | or ; or ,) so that files with these prefixes only get selected from the directory. Also return the played file name as a result (i.e. in {LastResult} or {Match.1}...etc)

This will be really helpful for the hopeless people without TTS, and it will save us a lot of time (instead of creating payload xml files and updating them each time). It also qualify as a great feature for other people and can be used for other purposes.

Thanks

Pages: [1] 2