VoxCommando

New Features and Feature Requests => Feature Requests => Topic started by: Haddood on May 28, 2014, 04:40:28 AM

Title: Voice recognition (biometrics)
Post by: Haddood on May 28, 2014, 04:40:28 AM
Finally I found an open source library for voice recognition

http://mistral.univ-avignon.fr/mediawiki/index.php/Main_Page#The_ALIZE_Project

Can this be developed into a plugin? This way VC will be able to know who is speaking not only what is being said :D
 
This level of programming is way beyond my capacity ... Or I would have gave it a try
Title: Re: Voice recognition (biometrics)
Post by: jitterjames on May 28, 2014, 08:54:40 AM
What would you actually use this for?  Surely not for security, and for voice bio-metrics to work the subject must generally speak a known pass-phrase.  How is this more useful than simply using our existing speech recognition to tell the computer who you are?
Title: Re: Voice recognition (biometrics)
Post by: Haddood on May 28, 2014, 10:00:43 AM
For example, if the pass phrase is the same as the prefix, saying good morning, VC will ask me if I want the news, Bixi... Etc. if my GF says good morning it will tell her the weather .... Etc. or listing todo list for the person. And start the coffee machine or the kettle based on who spoke ....

Coming home it will know who said I am home ... If it is secure enough, saying I am back in Skype or mumble or even intercoms will unlock the house door....

When saying play some music.... It will know which genre to play...

Saying turn of my room light ... It will know which room to turn of the light
Open netflix ... Will open the right profile ...

apart from switching Sapi 5 profiles automatically to increase the accuracy, based on who is speaking.... MSSP is good but with no dictation a lot of possibilities are eliminated...

Whenever there is a user profile, there will be an application

These are just few applications from the top of my head

I think it is one step toward VC with more AI  ;D ;D
Title: Re: Voice recognition (biometrics)
Post by: jitterjames on May 28, 2014, 10:58:36 AM
My point is that I am pretty sure you won't want to use it for real security related commands like unlocking a door or disabling an alarm, so the only real application is to tell it who you are in order to customize your commands.  But we can already do this very easily.

You can have a phrase "honey I'm home" which will tell it that it is you (Hadood), and another phrase "I am back darling" which will tell it that it is your wife.  Either way you each have to remember a phrase, the fact that you are each remembering a different phrase doesn't change anything in terms of usability.

In any case I also do not have the skills to adapt that library to .Net but I for one would not use it, nor would I want to waste all the extra processing power on such a thing unless it could reliably be used for security related tasks.

If you find a free .Net library, or a someone creates a .Net wrapper for this one, then I will take another look.  Or if this program could be made as a standalone that is able to send udp messages to VC through the network that would work too.
Title: Re: Voice recognition (biometrics)
Post by: Haddood on May 28, 2014, 12:55:15 PM
My point is that I am pretty sure you won't want to use it for real security related commands like unlocking a door or disabling an alarm

James,
in fact there is commercial solution targeting enterprise client for voice biometric, including identifying clients in call centers ... redundancy security ... face recognition can be tricked with a photo, but combined with voice recognition becomes harder ... anway this is out of VC scope ... just wanted to say seems the technology in general  is pretty reliable (not necessary this library)

You can have a phrase "honey I'm home" which will tell it that it is you (Hadood), and another phrase "I am back darling" which will tell it that it is your wife.  Either way you each have to remember a phrase, the fact that you are each remembering a different phrase doesn't change anything in terms of usability.
Interesting approach, in fact I already programed a command to switch the SR profile. however it has few limitations and draw backs, like duplicating many commands, switching SR Profile on the fly going to be hard to achieve...

If you find a free .Net library, or a someone creates a .Net wrapper for this one, then I will take another look.
good to know and will help to find the right stuff for the future

if this program could be made as a standalone that is able to send udp messages to VC through the network that would work too.

Was thinking of that as a possible substitute for a plugin. the program Running in background and trigger events in VC by UDP or command line with parameters

Maybe someone in this forum with the right skills will volunteer for this task
Title: Re: Voice recognition (biometrics)
Post by: nime5ter on May 28, 2014, 03:49:24 PM
Interesting approach, in fact I already programed a command to switch the SR profile. however it has few limitations and draw backs, like duplicating many commands, switching SR Profile on the fly going to be hard to achieve...

Is there a specific reason you've chosen to duplicate commands when switching profiles?

Maybe I'm misunderstanding, but I can't think of a technical reason why one would need to do that. The same commands can be used for multiple users even if they use their own SR profiles. VC.SetProfile should be possible to implement on the fly.

In terms of the big picture here, this is sounding similar to discussions I've had in the past on the forum. Just as one example, I can imagine the following scenario being pretty much the same in a two-person home, whether or not one is using voice biometrics:

1> Command used to confirm identity -- requires either a predetermined biometric pass phrase or a predetermined non-biometric pass phrase (obviously the latter will fail only if the user doesn't know the right phrase, rather than based on audio analysis).

Either way: This command sets the speech profile for the right user (if this seems necessary), while also enabling user-specific groups and disabling other groups that are specific to another user. By naming groups consistently, this is quite straightforward.

There could still be other "neutral" groups in this common configuration that both parties can use. These are left untouched.

2 > If desired, the above command could also then trigger follow-up commands depending on which user is identified to continue a personalized dialogue. This could even be customized to the time of day or whatever.

... Thought I'd throw that out there for one and all, in case it's something that someone wants to try to implement at home. :)


Title: Re: Voice recognition (biometrics)
Post by: jitterjames on May 28, 2014, 05:41:46 PM
Interesting approach, in fact I already programed a command to switch the SR profile. however it has few limitations and draw backs, like duplicating many commands, switching SR Profile on the fly going to be hard to achieve...

Quote from: nime5ter
VC.SetProfile should be possible to implement on the fly.

I guess it depends on what one means by "on the fly".  The VC.SetProfile action performs the switch very quickly, and does not require you to make any other changes, but it can't be done while you are in the middle of speaking a voice command (so forget trying to use a prefix event).  This will hold true with or without biometrics, or any other method of initiating the switch.  The recognition engine needs to be turned off, the profile changed, and then the engine is turned back on.  All of this is done automatically  by the action VC.SetProfile
Title: Re: Voice recognition (biometrics)
Post by: Haddood on May 29, 2014, 06:37:35 PM
Is there a specific reason you've chosen to duplicate commands when switching profiles?

I think I wasn't clear in my post. I do not have duplicate commands for the moment. I meant James suggestion will create duplicate commands.

You can have a phrase "honey I'm home" which will tell it that it is you (Hadood), and another phrase "I am back darling" which will tell it that it is your wife.  Either way you each have to remember a phrase, the fact that you are each remembering a different phrase doesn't change anything in terms of usability.

Accordingly I would need something similar as well to open netflix ... music ...etc. adding my son to the equation will make it even more cumbersome ... off course this is not essential and can be solved by just switching profile and give commands with no personal reference ... like play music for haddood ...etc. it just might feel awkward depending on the phrase ...


I guess it depends on what one means by "on the fly".  The VC.SetProfile action performs the switch very quickly, and does not require you to make any other changes, but it can't be done while you are in the middle of speaking a voice command (so forget trying to use a prefix event).  This will hold true with or without biometrics, or any other method of initiating the switch.  The recognition engine needs to be turned off, the profile changed, and then the engine is turned back on.  All of this is done automatically  by the action VC.SetProfile
I did not know that. however, that can be overridden easily, by saying one off the phrases that has persona interpretation (ie my room), depending on the scenario of the voice recognition;
1. plug in, VC will pass the wave info to plugin set a variable, plugin will generate event with user ... this event will trigger a command that based on the user and the variable for the command will trigger the right action.
2. if the voice recognition is running in parallel to VC will set a variable and stop. then VR will trigger event and again a command will do the branching ...

there could be other scenarios .. those are just from the top of my head ...

finally this might be day dreaming as the VR technology (at least the one I found) might be not reliable at all ...
Title: Re: Voice recognition (biometrics)
Post by: jitterjames on May 29, 2014, 06:49:22 PM
I am officially lost.   :bonk

I don't know if it is because I am not understanding you or you are not understanding me but I can't follow any more.
Title: Re: Voice recognition (biometrics)
Post by: Haddood on May 30, 2014, 05:23:55 PM
No worries James ... will dig this a bit more once I have sometime, and post my findings
Title: Re: Voice recognition (biometrics)
Post by: Haddood on March 25, 2015, 12:07:04 PM
digging and old topic .... I found that key lemon now support Speaker (voice recognition) as well as face recognition they offer rest api and even have a python wrapper
https://developers.keylemon.com/documentation/reference/libraries/wrappers
https://developers.keylemon.com/documentation/developer/basic/entry_points

they offer free account that will cover VC users

I can't get my head around rest api ... I tried the python wrapper but it needs other module (requests) and VC start to generate errors ...

if anybody can figure this out VC will acquire face and voice recognition  8) 8)

Title: Re: Voice recognition (biometrics)
Post by: nime5ter on March 25, 2015, 02:02:27 PM
That seems pretty neat, in theory.

If you can come up with a method to upload a voice recording on the fly to a specific, known URL, the basics are easy.

Start by recording a 5-second or more wav for it to use as its base model.  (or more than one wav might be better --  I justed used one). Upload the wav file.

Register on their site to get an API key & user name. I always put all that stuff in a map. In my example below the map is called keylemon.

Then you can create a "model" (for a particular person) using a Scrape.Post command similar to my example below.

You'd then have to have a system where a user says their special recognition phrase, which gets recorded and uploaded and compared to the model.

Code: [Select]
<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 2.1.4.2-->
<commandGroup open="True" name="keylemon" enabled="True" prefix="" priority="0" requiredProcess="" description="">
  <command id="689" name="create Naomi model" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="Creates a speaker model for Naomi based on a wav file uploaded to the web. I used a Dropbox link.&#xD;&#xA;&#xD;&#xA;I've created a map called keylemon, in which I store my username, api key, and models etc. when they're created. Replace {M:keylemon.username}&amp;key={M:keylemon.APIkey} with your own API key etc.&#xD;&#xA;&#xD;&#xA;This command will store the model ID number for &quot;Naomi&quot; in my keylemon (created by a successful HTTP post request).">
    <action>
      <cmdType>Scrape.Post</cmdType>
      <params>
        <param>https://api.keylemon.com/api/speaker/model/?user={M:keylemon.username}&amp;key={M:keylemon.APIkey}</param>
        <param>urls=URL PATH FOR YOUR WAVE FILE.wav&amp;name=Naomi</param>
        <param />
        <param />
        <param>application/x-www-form-urlencoded</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>Results.RegExSingle</cmdType>
      <params>
        <param>"model_id":\s"(.*?)".*?"name":\s"(.*?)"</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>Results.MatchToMap</cmdType>
      <params>
        <param>keylemon</param>
        <param>True</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <phrase>create Naomi</phrase>
  </command>
  <command id="690" name="recognize Naomi speaking" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="Analyzes a wav file uploaded somewhere online. You have to give it the target URL. Recognizer returns a &quot;score&quot; out of 100. It's up to you to choose how high a number constitutes a recognized voice.">
    <action>
      <cmdType>Scrape.Post</cmdType>
      <params>
        <param>https://api.keylemon.com/api/speaker/recognize/?user={M:keylemon.username}&amp;key={M:keylemon.APIkey}</param>
        <param>urls=URL PATH TO WHEREVER YOU ARE POSTING YOUR VOICE RECORDING&amp;models={M:keylemon.Naomi}</param>
        <param />
        <param />
        <param>application/x-www-form-urlencoded</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>Results.RegEx</cmdType>
      <params>
        <param>"score":\s(.*?)\}</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <if ifBlockDisabled="False" ifNot="False">
      <ifType>(A)&lt;(B)</ifType>
      <ifParams>{Match.1}&amp;&amp;85</ifParams>
      <then>
        <action>
          <cmdType>TTS.Speak</cmdType>
          <params>
            <param>Voice authorization for Naomi failed. Try again or go away.</param>
          </params>
          <cmdRepeat>1</cmdRepeat>
        </action>
      </then>
      <else>
        <action>
          <cmdType>TTS.Speak</cmdType>
          <params>
            <param>Welcome Naomi. Switching to your profile now.</param>
          </params>
          <cmdRepeat>1</cmdRepeat>
        </action>
        <action>
          <cmdType>OSD.ShowText</cmdType>
          <params>
            <param>Welcome Naomi. Switching to your profile now.</param>
          </params>
          <cmdRepeat>1</cmdRepeat>
        </action>
      </else>
    </if>
    <phrase>Test the voice recognition for Naomi</phrase>
  </command>
</commandGroup>
Title: Re: Voice recognition (biometrics)
Post by: Haddood on March 25, 2015, 04:37:51 PM
Wow that was fast  :clap
now time to play :)
Title: Re: Voice recognition (biometrics)
Post by: nime5ter on March 25, 2015, 05:07:22 PM
:) Play is the operative word.

The http request part is easy, which is all I posted above, as that was what you said was the problem for you. (Pretty much the same as Instapush and all the other examples on the forum, Haddood.  ::club)

The problem is the practical implementation of the uploading your voice recordings all the time and then the question of how good a job they do with their analysis, etc. I compared the same recording to itself using their API, and got a "score" of 99 or something like that. I haven't tried to compare two (or more) different recordings.

I'm a bit skeptical and also tend to be wary of Internet-dependent solutions. Still, a fun experiment!
Title: Re: Voice recognition (biometrics)
Post by: nime5ter on March 26, 2015, 11:21:28 AM
Last night I had time to expand the proof of concept. This uses VoxWav to record the "voice print" that gets uploaded to Dropbox.

Findings are:

a) You need pretty long voice recordings and pass phrases for this to work decently.
b) Depending on your Internet connection, the first attempt to recognize the voice print may not work and you'll have to re-try.

Note: If you're not using VoxWav with VC and/or you're not a -- let's say -- "advanced" VC user, this project should probably be avoided. ;-)
[Update: James has now added a standard action that you can use instead of VoxWav if you prefer: VcAdvanced.SaveRecoWav (http://voxcommando.com/mediawiki/index.php?title=Actions#SaveRecoWav)]

Four commands here:

1. Not really needed, but VoxWav users can use it to record their voice profile data if they want. This is used to create keylemon "models" (user profiles).

2. Use it to create user models. Sample command needs to be adapted to your user names and file paths etc.

3. "Begin User Authorization". See description in the command. This command automatically triggers the voice data processing command.

4. Voice processing command. Currently just tells you whether a voice was identified or not.

Code: [Select]
<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 2.1.4.2-->
<commandGroup open="True" name="keylemon" enabled="True" prefix="" priority="0" requiredProcess="" description="">
  <command id="705" name="1 - record voice data for model creation" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="Use with VoxWav to record your voice. Replace with your own file path. After, you can use these to create a speaker model (voice profile for a specific user). Recordings must be good quality and at least 4 or 5 seconds long.">
    <action>
      <cmdType>TTS.SpeakSync</cmdType>
      <params>
        <param>Start talking.</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>TcpMic.SaveNext</cmdType>
      <params>
        <param>local file path to Dropbox folder\{1}.wav</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <phrase>Record voice print for</phrase>
    <payloadList>James, Naomi</payloadList>
  </command>
  <command id="689" name="2 - create model for {1}" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="Creates a speaker model for {1} based on a wav file (or files) uploaded to the web. I used a Dropbox link -- saved to my map as {M:keylemon.dropboxURL}. Separate multiple wav file URLs with commas. I've created a map called keylemon, in which I store my username, api key, and models etc.">
    <action>
      <cmdType>Scrape.Post</cmdType>
      <params>
        <param>https://api.keylemon.com/api/speaker/model/?user={M:keylemon.username}&amp;key={M:keylemon.APIkey}</param>
        <param>urls={M:keylemon.dropboxURL}/{1}.wav&amp;name={1}</param>
        <param />
        <param />
        <param>application/x-www-form-urlencoded</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>Results.RegExSingle</cmdType>
      <params>
        <param>"model_id":\s"(.*?)".*?"name":\s"(.*?)"</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>Results.MatchToMap</cmdType>
      <params>
        <param>keylemon</param>
        <param>True</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <phrase>create model for</phrase>
    <payloadList>James, Naomi</payloadList>
  </command>
  <command id="690" name="Begin user authorization" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="Works best if you use a pass phrase that matches one of your model phrases. Recognizer returns a &quot;score&quot; out of 100.">
    <action>
      <cmdType>TTS.SpeakSync</cmdType>
      <params>
        <param>Please say your pass phrase now.</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>TcpMic.SaveNext</cmdType>
      <params>
        <param>LOCAL PATH TO YOUR DROPBOX\Public\keylemon\authUser.wav</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>VC.SetEventTimer</cmdType>
      <params>
        <param>8s</param>
        <param>postvoicedata</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <phrase>Begin user authorization</phrase>
  </command>
  <command id="703" name="process voice data" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="Compares voice data to my 2 user models (James and Naomi). Recognizer returns a &quot;score&quot; out of 100. Proof of concept. Doesn't do anything useful at the moment.">
    <action>
      <cmdType>TTS.SpeakSync</cmdType>
      <params>
        <param>Processing data. Please wait.</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>Scrape.Post</cmdType>
      <params>
        <param>https://api.keylemon.com/api/speaker/recognize/?user={M:keylemon.username}&amp;key={M:keylemon.APIkey}</param>
        <param>urls={M:keylemon.dropboxURL}/authUser.wav&amp;models={M:keylemon.Naomi},{M:keylemon.James}</param>
        <param />
        <param />
        <param>application/x-www-form-urlencoded</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <if ifBlockDisabled="False" ifNot="True">
      <ifType>LastActionSuccess</ifType>
      <ifParams>&amp;&amp;</ifParams>
      <then>
        <action>
          <cmdType>TTS.Speak</cmdType>
          <params>
            <param>Voice processing failed.</param>
          </params>
          <cmdRepeat>1</cmdRepeat>
        </action>
        <action>
          <cmdType>OSD.ShowText</cmdType>
          <params>
            <param>Processing failed. {CR} If you think your recording was good, say: "Re-try voice authorization"</param>
          </params>
          <cmdRepeat>1</cmdRepeat>
        </action>
        <action>
          <cmdType>VC.StopMacro</cmdType>
          <params />
          <cmdRepeat>1</cmdRepeat>
        </action>
      </then>
      <else />
    </if>
    <action>
      <cmdType>Results.RegEx</cmdType>
      <params>
        <param>"name":\s"(.*?)",\s"score":\s(.*?)\}</param>
        <param> - </param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <if ifBlockDisabled="False" ifNot="False">
      <ifType>(A)&lt;(B)</ifType>
      <ifParams>85&amp;&amp;{Match.1.2}</ifParams>
      <then>
        <action>
          <cmdType>TTS.Speak</cmdType>
          <params>
            <param>Match found. Welcome {Match.1.1}. Switching to profile for {Match.1.1}</param>
          </params>
          <cmdRepeat>1</cmdRepeat>
        </action>
        <action>
          <cmdType>VC.SetProfile</cmdType>
          <params>
            <param>{Match.1.1}</param>
          </params>
          <cmdRepeat>0</cmdRepeat>
        </action>
      </then>
      <else />
    </if>
    <if ifBlockDisabled="False" ifNot="False">
      <ifType>(A)&lt;(B)</ifType>
      <ifParams>85&amp;&amp;{Match.2.2}</ifParams>
      <then>
        <action>
          <cmdType>TTS.Speak</cmdType>
          <params>
            <param>Match found. Welcome {Match.2.1}. </param>
          </params>
          <cmdRepeat>1</cmdRepeat>
        </action>
        <action>
          <cmdType>VC.SetProfile</cmdType>
          <params>
            <param>{Match.2.1}</param>
          </params>
          <cmdRepeat>0</cmdRepeat>
        </action>
      </then>
      <else />
    </if>
    <action>
      <cmdType>OSD.ShowText</cmdType>
      <params>
        <param>Results:</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>OSD.AddText</cmdType>
      <params>
        <param>{Match.{i}.1}: {Match.{i}.2}</param>
      </params>
      <cmdRepeat>{#M}</cmdRepeat>
    </action>
    <action>
      <cmdType>File.Delete</cmdType>
      <params>
        <param>LOCAL PATH TO YOUR DROPBOX\Public\keylemon\authUser.wav</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <event>postvoicedata</event>
    <phrase>Re-try voice authorization</phrase>
  </command>
</commandGroup>

Users will need to get their own username and API key from the keylemon website.

POST SCRIPT: Although this will work for those who don't mind an Internet-reliant system, and has a "fun factor" to it, it has no genuine advantage that I can see over simply having a dedicated pass phrase for each user -- without using biometrics (as I described near the beginning of this thread). The simple pass phrase ("Call me Ishmael", "My name is Luka") does not require Internet or waiting for said Internet service or extra levels of potential failure.
Title: Re: Voice recognition (biometrics)
Post by: Haddood on March 26, 2015, 03:58:36 PM
I am having trouble sending the files through dropbox which made me think of using the webserver through TCP plugin ... that will even help avoiding the upload time needed by dropbox before continuing the command (won't affect the total time)
for sure router will need to be set up for port forwarding... then scraping the external IP is piece of cake or using DDNs ...

I created a folder in: C:\Extensions\Vox Commando\plugins\TCP\html\keyLemon ... when I put an html page I can access it with explorer. However, when I try to access the wav file I get error 404 not found... is the web server limited to html or something ?

http://127.0.0.1:8088/html/KeyLemon/index.html ... works
http://127.0.0.1:8088/html/KeyLemon/Haddood.wav ... I get error 404

meanwhile for those like me who do not have voxwav, one can use windows Sound Recorder (see below)
SoundRecorder /FILE filename.filetype /DURATION hhhh:mm:s

for the time being I will be triggering the recording with Prefix Start and we will see where that goes.

as well uploading multiple files to the same model should increase the accuracy ...

the way I see this can work in a very nice way ... is if James can add an option to save "Last Heard" as a wav file .... with folder watcher approach, VC can trigger the recognition precess once the file is saved ...

and to make it even better if the phrase length is less than 8 seconds, VC can attach duplicate the file x of times to reach an 8 second ... at the end the system do not require the same phrase to be said ... let's say VC hears : "VC lights of" and that is 1.5 seconds ... VC  would save a wav "VC lights of VC lights of VC lights of VC lights of VC lights of VC lights of" and that is 9 seconds ... of course that depends on James if he is onboard to experiment with that ...

once a match found ... VC can switch the SR profile to the right user ... or at least personalise responses

----- command recording using windows sound recorder

Code: [Select]
<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 2.1.4.2-->
<command id="705" name="1 - record voice data for model creation" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="Use with VoxWav to record your voice. Replace with your own file path. After, you can use these to create a speaker model (voice profile for a specific user). &#xD;&#xA;Recordings must be good quality and at least 4 or 5 seconds long.">
  <action>
    <cmdType>TTS.SpeakSync</cmdType>
    <params>
      <param>Start talking.</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>Launch.Hidden</cmdType>
    <params>
      <param>c:\windows\system32\SoundRecorder.exe</param>
      <param> /FILE {M:Credentials.KeyLemonFolder}\{1}.wav /DURATION 0000:00:8</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <phrase>Record voice print for</phrase>
  <payloadFromXML phraseOnly="True" use2partPhrase="False" phraseConnector="by" Phrase2wildcard="anyone" optional="False">payloads\Users.xml</payloadFromXML>
</command>
Title: Re: Voice recognition (biometrics)
Post by: nime5ter on March 26, 2015, 05:23:55 PM
I created a folder in: C:\Extensions\Vox Commando\plugins\TCP\html\keyLemon ... when I put an html page I can access it with explorer. However, when I try to access the wav file I get error 404 not found... is the web server limited to html or something ?

Not limited to html per se (e.g. image files will work), however it *is* described as a "simple web server". :)

Interesting discovery.

Quote
and to make it even better if the phrase length is less than 8 seconds, VC can attach duplicate the file x of times to reach an 8 second ... at the end the system do not require the same phrase to be said ... let's say VC hears : "VC lights of" and that is 1.5 seconds ... VC  would save a wav "VC lights of VC lights of VC lights of VC lights of VC lights of VC lights of" and that is 9 seconds ...

That will not work. The issue is on the keylemon side. They require that *each wav file* analysed meets certain basic standards. If you send them a bunch of little wavs they will simply reject them.

It doesn't have to be 9 seconds though. I had decent luck with 4 to 5 seconds. But that's actually quite a mouthful.

It's nice to dream, but in the end we have to work with the cards we're given. If users really want to use this web service, the only semi-practical way is to settle on a particular pass phrase that meets their requirements.

... With regard to saving your voice command as a wav file even without VoxWav, I believe that James has already made your dream come true on that one (whenever the next release comes out). ;-)


Title: Re: Voice recognition (biometrics)
Post by: Haddood on March 26, 2015, 11:33:02 PM
Not limited to html per se (e.g. image files will work), however it *is* described as a "simple web server". :)

Interesting discovery.
Does that means I should look for another web server? Is there a place were I can find specs and docs for it?
I hope I will not need to install IIS or a full web server, it will be a kill for my modest machine

That will not work. The issue is on the keylemon side. They require that *each wav file* analysed meets certain basic standards. If you send them a bunch of little wavs they will simply reject them.

It doesn't have to be 9 seconds though. I had decent luck with 4 to 5 seconds. But that's actually quite a mouthful.

I meant that the words are repeated inside one file by relating the wav multiple times, like copy paste at the end, then save file

... With regard to saving your voice command as a wav file even without VoxWav, I believe that James has already made your dream come true on that one (whenever the next release comes out). ;-)
::yikes I can't wait ... Now I will be sleep deprived   ::saddest
I just hope that saving the wave will generate an event, with length in seconds as payload  ::)

You guys are doing fantastic job  :clap :clap


It's nice to dream, but in the end we have to work with the cards we're given. If users really want to use this web service, the only semi-practical way is to settle on a particular pass phrase that meets their requirements.

It is dreams that got us out of the caves (or at least that is what I tell my students) ... ;)

Anyway, special thanks to you ... Your feedback and sharing solutions making programming VC even more fun than using it
Title: Re: Voice recognition (biometrics)
Post by: nime5ter on March 27, 2015, 09:24:01 AM
Does that means I should look for another web server? Is there a place were I can find specs and docs for it?

The simplest solution would be for you to use curl, especially since you're already familiar with it.

According to the keylemon documentation (https://developers.keylemon.com/documentation/developer/rest_api/model_creation_speaker), both when creating a speaker model and for speaker verification, there are two ways you can provide them with the wav data (see the "urls" parameter):

Quote
Parameters

Name   Description
Required   user 1   Your API username
key 1   Your API key
urls   Comma separated list of public URLs of audio files to use for the recognition. Additionnaly, audio files can be uploaded as multipart attachment of the POST request.
models   A comma separated list of models ids or groups ids
identities   A comma separated list of identities ids
Optional   max_result   The maximum number of results (default 10)
async   Boolean. Perform recognition in asynchronous mode (default false)
mean   Boolean. Compute the arithmetic mean of the results, if multiple samples are tested (default false)

In my example above, I'm using the public URL option, but you can instead try uploading your wav files as a multipart attachment using curl.

I believe you already know how to structure post requests in curl from your experience with Instapush? To post multipart data, see the documentation for curl's -F option. http://curl.haxx.se/docs/manpage.html