Author Topic: VR recognition question  (Read 2513 times)

0 Members and 1 Guest are viewing this topic.

AshaiRey

  • Jr. Member
  • **
  • Posts: 5
  • Karma: 1
    • View Profile
VR recognition question
« on: October 27, 2016, 07:09:58 AM »
I am a bit confused here.
I am using VoxCommandoSP and have made a command tree with a few commands in Dutch. All the commands work, VC is picking them up and run the macro behind it.
So far so good. However i notice often that the sentence is correctly recognize as show in the top bar of the interfece but the command won't trigger. The history panel tells me vc. notrecogized. I've put down the confidence level as low as 5 but still no improvement.

According to me the speech recognizer detects the right words but the confidence level of that detection is most of the time lower then the confidence level that is set in the GUI.

jitterjames

  • Administrator
  • Hero Member
  • *****
  • Posts: 7715
  • Karma: 116
    • View Profile
    • VoxCommando
Re: VR recognition question
« Reply #1 on: October 27, 2016, 09:13:56 AM »
Hi AshaiRey,

I can see why this is confusing for you.  This is a tough one to diagnose and I don't have any personal experience with the Dutch engine, but usually this happens when we say the command incorrectly. Perhaps there is something about the way you have structured your phrases and payloads that is causing this.

If I could see a log of this happening and also be able to view your command tree XML I might be able to suggest something.

Or if you are interested and you are comfortable speaking English to me, it might be more efficient for us to connect using TeamViewer so that I can see what is happening and try to diagnose the problem directly.

There are two things you should know that may alleviate some of your confusion.

1) Microsoft has an internal threshold for command confidence. I think it is at around 40 percent. We must have a confidence higher than this value before VoxCommando even gets a recognized event triggered, at which point VoxCommando checks the required confidence against the confidence of the recognized command. I may be able to implement a workaround for this but normally it's not a problem and we don't want to set our required confidence that low.

2) The text at the top as we are speaking is "guessed text". Microsoft's speech engine generates several events as commands are partially understood. But these events do not indicate that a full command was understood completely. That is why I wonder if maybe your commands require you to say a final word or words in order to complete the command. If you are new to VoxCommando maybe you have misunderstood the rules of building command phrases.
« Last Edit: October 27, 2016, 11:35:14 AM by jitterjames »

AshaiRey

  • Jr. Member
  • **
  • Posts: 5
  • Karma: 1
    • View Profile
Re: VR recognition question
« Reply #2 on: October 27, 2016, 03:31:37 PM »
Hi JitterJames,

Thx for your help this far.
First a little background. A few years ago i wrote my own VR client because i needed one that understand Dutch and had to run on Win7. So it's based on the MS SR11. With it i had my share of quirks and wonderings why thing work like that and not according the MS docs. On of those things is the confidence parameter you can set in C# code. I know it's a weighted value and not a % but i never had the feeling that it worked as it should with the Dutch language files. Having said this i didn't compare this with the English language files so that may be an experiment for later. My client is showing also an other odd behavior. It looks like that the sensitivity or the confidence level so to say isn't fixed but it shows a cyclic behavior. Over a few weeks the hit rate decreases until it becomes nearly deaf for all commands except the attention phrase and then suddenly it starts again hyper sensitive and triggering far to often.

I had looked at voxcommando before but i noticed recently that it now also supports Dutch so i look into it as if it could be an replacement for my own code. So back to the things that matter.

I can see why this is confusing for you.  This is a tough one to diagnose and I don't have any personal experience with the Dutch engine, but usually this happens when we say the command incorrectly. Perhaps there is something about the way you have structured your phrases and payloads that is causing this.

If I could see a log of this happening and also be able to view your command tree XML I might be able to suggest something.

Because i am just testing this i keep thing really simple to keep the number of variables low. I'll see what kind off logging and xml i can attach

Quote
Or if you are interested and you are comfortable speaking English to me, it might be more efficient for us to connect using TeamViewer so that I can see what is happening and try to diagnose the problem directly.
I am not comfortable with that because it's running on a dedicated home automation server. If i can sort this out i will install it on a different machine and have a go.

Quote
There are two things you should know that may alleviate some of your confusion.
1) Microsoft has an internal threshold for command confidence. I think it is at around 40 percent. We must have a confidence higher than this value before VoxCommando even gets a recognized event triggered
Interesting piece of info. I didn't know this.
So lowering it below 40 has no effect anymore.

Quote
2) The text at the top as we are speaking is "guessed text". Microsoft's speech engine generates several events as commands are partially understood. But these events do not indicate that a full command was understood completely. That is why I wonder if maybe your commands require you to say a final word or words in order to complete the command. If you are new to VoxCommando maybe you have misunderstood the rules of building command phrases.
Oke, this was a misunderstanding of mine. I thought that the text at the top was what the engine heard and would be used to check against the rules (grammar).

Oke i fabricated some files.
First a cleaned xml brought back to the minimum
I use an attention phrase "attention Willie in the kitchen" (attentie willie in de keuken). This will trigger an event in my home automation system and there everything is switched for audio in the kitchen

While i have attention i ask for the time "how late is it" (hoe laat is het)

Next i tell the system to end the session "go to sleep" (Ga maar slapen)

The controles i use are scrape and paramRaw. I want to see if using vbscript instead of http will speed up things a bit so nevermind those actions.

I use high end boundry mic's with a XAP800 for filtering, noise and echo canceling. I got a clear and crisp sound at the mic input on the PC. The mic levels on the PC is at about 40-50%

Edit.
I just did an new test with voxcommando.exe and the English engine. I replaced the Dutch phrases for English one and the response is far, far better now. It seems that the Dutch grammar files are of poor quality.
« Last Edit: October 27, 2016, 03:45:48 PM by AshaiRey »

jitterjames

  • Administrator
  • Hero Member
  • *****
  • Posts: 7715
  • Karma: 116
    • View Profile
    • VoxCommando
Re: VR recognition question
« Reply #3 on: November 02, 2016, 12:09:43 PM »
Sorry for the delay in getting back to you.  I did look at your tree and log, although the log is not really that helpful in this case.

I can't speak Dutch so it's difficult for me to do much testing.  Today I decided to see what I could manage and I came up with this:



I downloaded some (not very good quality) audio clips from the internet and added a command with a payload xml and used phrases to match the audio clips.  I'm playing the clips on my crappy built-in monitor speakers, and using an inexpensive (but not bad) microphone on my Logitech USB webcam which is about 12 years old.

For the most part it seems to be recognizing all the phrases OK although it does better with the male voice.

While it does not surprise me to hear that the English engine works better than the Dutch one, you can see that the Dutch recognizer in SP does work.  I do not know the internal workings of the MS speech engines but if you are seeing varying results with it, then the first suspect that I would investigate is your audio setup.  Acoustic echo cancellation, noise reduction, and automatic gain control, are all known by us to (in some cases) cause more problems with speech recognition than they solve.  It would also explain why you see different results on different days.  The speech engine used in SP does not do any adaptive learning like the regular engine, so I would expect it to always give the same results when it hears the same audio.  Of course it is a bit of a "black box" so I can't say for sure.  It may well  have some adaptive mechanisms in place for things like volume, and it may even have a male mode and a female mode.  Who knows?  I also don't know what your accent is like relative to these online samples I'm using.

One thing to note which is not obvious.  Having very few phrases in your overall command structure can actually cause problems with the accuracy of the confidence values returned by VC.  You will usually see better accuracy in your confidence when you use a larger set of commands because the confidence value is all relative to the other phrases that you might be able to say.  There is really no problem having a few thousand different voice commands as long as you don't have commands using phrases that sound very similar to each other.

Another thing is that you may have your volume levels too high.  I can't know this but you said "The mic levels on the PC is at about 40-50%" and it might be better if the levels are more like 20% to 30%.  Some trial and error will tell but usually people think that louder is better and it isn't.

I recently made some changes to VoxCommando that could help you in the case where you want to lower the confidence to a very low level.  It will be included in the next release although I'm not sure at this point when that will be.


jitterjames

  • Administrator
  • Hero Member
  • *****
  • Posts: 7715
  • Karma: 116
    • View Profile
    • VoxCommando
Re: VR recognition question
« Reply #4 on: November 02, 2016, 12:15:21 PM »
Oh, I almost forgot.

It looks like you are not using a prefix and I really recommend that you use one.  It will not only help to reduce false positives, it will also help to improve your confidence values for valid commands.  I know you are just testing but if you are going to use an always on open microphone with multiple inputs then a prefix is pretty much mandatory anyway.

AshaiRey

  • Jr. Member
  • **
  • Posts: 5
  • Karma: 1
    • View Profile
Re: VR recognition question
« Reply #5 on: November 03, 2016, 07:07:08 AM »
Oh, I almost forgot.

It looks like you are not using a prefix and I really recommend that you use one.  It will not only help to reduce false positives, it will also help to improve your confidence values for valid commands.  I know you are just testing but if you are going to use an always on open microphone with multiple inputs then a prefix is pretty much mandatory anyway.

Thx for your time invested into this. I will do some more testing and come back to it. However i would like to comment already on this. I do use always on open air mics. 5 mics, one in each room. I stepped away from using the gating info that the XAP800 mixer gave me because that was to unreliable to get that into my system. Instead i use an attention phrase for each room like Attention Willy in to [room]. I could use [Attention Willy] as a prefix. However i noticed an option in VC that I believe is also right to use. All commands are in standby mode but the 5 attention phrase are marked as always ON. I believe this will prevent many false positive during normal conversations in the room as they only will trigger as VC is set to ON. A test i did with this showed me that is was working as expected.

Quote
Another thing is that you may have your volume levels too high.  I can't know this but you said "The mic levels on the PC is at about 40-50%" and it might be better if the levels are more like 20% to 30%.  Some trial and error will tell but usually people think that louder is better and it isn't.
I am aware of not setting the sound levels too high. The max 40-50% l mentioned is at the PC mic input while i speak normally at a distance of 30cm. On average is speak at a distance of 150 cm and then the levels are around 20%. In some rooms i can speak up too about 6 meters away and i still will be heared. Due to this one mic per room setup i have to allow for some slack in the sound levels. I've balanced every room in the mixer to give me the same sound levels and quality. For this i used 30m shielded wire and attached it the the output of the XAP800 mixer that will lead into the PC mic in. On the other end i connected some good headphones. Due this olng wire i am able to go to every room and listen to what i say there. Using a laptop i adjusted the settings for each room till the levels where not to high, no echo, no noise and the background radio canceled out correctly so that i only hear my voice crisp and clear.

This setup is working fine for me for several years now. The only trouble i have is that VR is degrading over a course of a couple of weeks (Not using VC btw) and then the next day it's working 100% again to start degrading again. Software restarts and PC reboots have no effect on this liniar degrading so i suspects the the VR engine is storing things internally to even survive reboots.

I will make a more extended test with your remarks in mind and come back to you.
Thanks again for your effort this far.

Btw, one line in your testset made me laugh so badly.
This one : Sterf-op-straat Worst
In English i will be : Die on the street Sausage  :biglaugh
« Last Edit: November 03, 2016, 07:11:03 AM by AshaiRey »

jitterjames

  • Administrator
  • Hero Member
  • *****
  • Posts: 7715
  • Karma: 116
    • View Profile
    • VoxCommando
Re: VR recognition question
« Reply #6 on: November 03, 2016, 11:57:43 AM »
Instead i use an attention phrase for each room like Attention Willy in to [room]. I could use [Attention Willy] as a prefix. However i noticed an option in VC that I believe is also right to use. All commands are in standby mode but the 5 attention phrase are marked as always ON. I believe this will prevent many false positive during normal conversations in the room as they only will trigger as VC is set to ON. A test i did with this showed me that is was working as expected.

Yes this is an option  (provided that you return VC to standby mode after you issue your commands) and you can set it up however you want.  What I am suggesting is that by using a prefix you will not just reduce false positives, but you will also get a more accurate and higher confidence when you say a valid voice command. This has been my experience in any case.

The male voice with the sausage phrase comes from here: http://www.heardutchhere.net/forlaffs.html
« Last Edit: November 03, 2016, 01:49:07 PM by jitterjames »

PegLegTV

  • $upporter
  • Hero Member
  • *****
  • Posts: 500
  • Karma: 43
    • View Profile
Re: VR recognition question
« Reply #7 on: November 03, 2016, 04:26:10 PM »
how could you? ::yikes :'(
commercial photography locations

LOL   :biglaugh
« Last Edit: August 28, 2017, 08:01:11 PM by PegLegTV »

jitterjames

  • Administrator
  • Hero Member
  • *****
  • Posts: 7715
  • Karma: 116
    • View Profile
    • VoxCommando
Re: VR recognition question
« Reply #8 on: November 03, 2016, 04:31:06 PM »
Awesome.   :biglaugh

You just need to add a chalk outline on the sidewalk.

AshaiRey

  • Jr. Member
  • **
  • Posts: 5
  • Karma: 1
    • View Profile
Re: VR recognition question
« Reply #9 on: November 04, 2016, 06:42:33 AM »
Well i've aded more commands for this test and it definitly improves the hit rate so this gives me the right feeling to work further on this to intergrate it in my system.

Using a prefix seems to be a hot item. As i understand this this will work like when you use a prefix you can give a command directly without saying first the attention phrase.
There are 2 issues that i have to consider.
The first one it the WAF. If the wife don't like it then i don't have a happy home.  ::) She have already a hard time using voice commands because her pitch and volume is quite different. Any change that  lowers the hit rate even a small bit is a no go.
Secondly is more technical. As said i use 5 mics in different rooms to listen to the attention phrase. I don't use the gating info from the XAP800 since it wasn't reliable enough for me. So the attention phrase has to have the info in it of the room you are in. What will happen if an attention phrase is detected is that all the other mics will be muted. This will prevent interference with the voice commands and gives a much higher hit rate.

I use a seperate command to put VR back into listening mode (stand bye) and unmute all the mics again.

jitterjames

  • Administrator
  • Hero Member
  • *****
  • Posts: 7715
  • Karma: 116
    • View Profile
    • VoxCommando
Re: VR recognition question
« Reply #10 on: November 04, 2016, 11:58:47 AM »
I suggested the prefix in order to increase the hit rate and therefore the WAF. You do not need to replace your other systems. They can be used together with a prefix. The prefix should be easy to say and sound natural at the beginning of a sentence. A name with 2 or 3 syllables is recommended. Alexa is very good for example. Try a few different ones and see what works and feels best.