Author Topic: Voice recognition (biometrics) (Read 7466 times)

Haddood · « **on:** May 28, 2014, 04:40:28 AM »

Finally I found an open source library for voice recognition

http://mistral.univ-avignon.fr/mediawiki/index.php/Main_Page#The_ALIZE_Project

Can this be developed into a plugin? This way VC will be able to know who is speaking not only what is being said

This level of programming is way beyond my capacity ... Or I would have gave it a try

jitterjames · « **Reply #1 on:** May 28, 2014, 08:54:40 AM »

What would you actually use this for? Surely not for security, and for voice bio-metrics to work the subject must generally speak a known pass-phrase. How is this more useful than simply using our existing speech recognition to tell the computer who you are?

Haddood · « **Reply #2 on:** May 28, 2014, 10:00:43 AM »

For example, if the pass phrase is the same as the prefix, saying good morning, VC will ask me if I want the news, Bixi... Etc. if my GF says good morning it will tell her the weather .... Etc. or listing todo list for the person. And start the coffee machine or the kettle based on who spoke ....

Coming home it will know who said I am home ... If it is secure enough, saying I am back in Skype or mumble or even intercoms will unlock the house door....

When saying play some music.... It will know which genre to play...

Saying turn of my room light ... It will know which room to turn of the light
Open netflix ... Will open the right profile ...

apart from switching Sapi 5 profiles automatically to increase the accuracy, based on who is speaking.... MSSP is good but with no dictation a lot of possibilities are eliminated...

Whenever there is a user profile, there will be an application

These are just few applications from the top of my head

I think it is one step toward VC with more AI

jitterjames · « **Reply #3 on:** May 28, 2014, 10:58:36 AM »

My point is that I am pretty sure you won't want to use it for real security related commands like unlocking a door or disabling an alarm, so the only real application is to tell it who you are in order to customize your commands. But we can already do this very easily.

You can have a phrase "honey I'm home" which will tell it that it is you (Hadood), and another phrase "I am back darling" which will tell it that it is your wife. Either way you each have to remember a phrase, the fact that you are each remembering a different phrase doesn't change anything in terms of usability.

In any case I also do not have the skills to adapt that library to .Net but I for one would not use it, nor would I want to waste all the extra processing power on such a thing unless it could reliably be used for security related tasks.

If you find a free .Net library, or a someone creates a .Net wrapper for this one, then I will take another look. Or if this program could be made as a standalone that is able to send udp messages to VC through the network that would work too.

Haddood · « **Reply #4 on:** May 28, 2014, 12:55:15 PM »

Quote from: jitterjames on May 28, 2014, 10:58:36 AM

My point is that I am pretty sure you won't want to use it for real security related commands like unlocking a door or disabling an alarm

James,
in fact there is commercial solution targeting enterprise client for voice biometric, including identifying clients in call centers ... redundancy security ... face recognition can be tricked with a photo, but combined with voice recognition becomes harder ... anway this is out of VC scope ... just wanted to say seems the technology in general is pretty reliable (not necessary this library)

Quote from: jitterjames on May 28, 2014, 10:58:36 AM

You can have a phrase "honey I'm home" which will tell it that it is you (Hadood), and another phrase "I am back darling" which will tell it that it is your wife. Either way you each have to remember a phrase, the fact that you are each remembering a different phrase doesn't change anything in terms of usability.

Interesting approach, in fact I already programed a command to switch the SR profile. however it has few limitations and draw backs, like duplicating many commands, switching SR Profile on the fly going to be hard to achieve...

Quote from: jitterjames on May 28, 2014, 10:58:36 AM

If you find a free .Net library, or a someone creates a .Net wrapper for this one, then I will take another look.

good to know and will help to find the right stuff for the future

Quote from: jitterjames on May 28, 2014, 10:58:36 AM

if this program could be made as a standalone that is able to send udp messages to VC through the network that would work too.

Was thinking of that as a possible substitute for a plugin. the program Running in background and trigger events in VC by UDP or command line with parameters

Maybe someone in this forum with the right skills will volunteer for this task

nime5ter · « **Reply #5 on:** May 28, 2014, 03:49:24 PM »

Quote from: Haddood on May 28, 2014, 12:55:15 PM

Interesting approach, in fact I already programed a command to switch the SR profile. however it has few limitations and draw backs, like duplicating many commands, switching SR Profile on the fly going to be hard to achieve...

Is there a specific reason you've chosen to duplicate commands when switching profiles?

Maybe I'm misunderstanding, but I can't think of a technical reason why one would need to do that. The same commands can be used for multiple users even if they use their own SR profiles. VC.SetProfile should be possible to implement on the fly.

In terms of the big picture here, this is sounding similar to discussions I've had in the past on the forum. Just as one example, I can imagine the following scenario being pretty much the same in a two-person home, whether or not one is using voice biometrics:

1> Command used to confirm identity -- requires either a predetermined biometric pass phrase or a predetermined non-biometric pass phrase (obviously the latter will fail only if the user doesn't know the right phrase, rather than based on audio analysis).

Either way: This command sets the speech profile for the right user (if this seems necessary), while also enabling user-specific groups and disabling other groups that are specific to another user. By naming groups consistently, this is quite straightforward.

There could still be other "neutral" groups in this common configuration that both parties can use. These are left untouched.

2 > If desired, the above command could also then trigger follow-up commands depending on which user is identified to continue a personalized dialogue. This could even be customized to the time of day or whatever.

... Thought I'd throw that out there for one and all, in case it's something that someone wants to try to implement at home.

jitterjames · « **Reply #6 on:** May 28, 2014, 05:41:46 PM »

Quote from: Haddood on May 28, 2014, 12:55:15 PM

Interesting approach, in fact I already programed a command to switch the SR profile. however it has few limitations and draw backs, like duplicating many commands, switching SR Profile on the fly going to be hard to achieve...

Quote from: nime5ter

VC.SetProfile should be possible to implement on the fly.

I guess it depends on what one means by "on the fly". The VC.SetProfile action performs the switch very quickly, and does not require you to make any other changes, but it can't be done while you are in the middle of speaking a voice command (so forget trying to use a prefix event). This will hold true with or without biometrics, or any other method of initiating the switch. The recognition engine needs to be turned off, the profile changed, and then the engine is turned back on. All of this is done automatically by the action VC.SetProfile

Haddood · « **Reply #7 on:** May 29, 2014, 06:37:35 PM »

Quote from: nime5ter on May 28, 2014, 03:49:24 PM

Is there a specific reason you've chosen to duplicate commands when switching profiles?

I think I wasn't clear in my post. I do not have duplicate commands for the moment. I meant James suggestion will create duplicate commands.

Quote from: jitterjames on May 28, 2014, 10:58:36 AM

You can have a phrase "honey I'm home" which will tell it that it is you (Hadood), and another phrase "I am back darling" which will tell it that it is your wife. Either way you each have to remember a phrase, the fact that you are each remembering a different phrase doesn't change anything in terms of usability.

Accordingly I would need something similar as well to open netflix ... music ...etc. adding my son to the equation will make it even more cumbersome ... off course this is not essential and can be solved by just switching profile and give commands with no personal reference ... like play music for haddood ...etc. it just might feel awkward depending on the phrase ...

Quote from: jitterjames on May 28, 2014, 05:41:46 PM

I guess it depends on what one means by "on the fly". The VC.SetProfile action performs the switch very quickly, and does not require you to make any other changes, but it can't be done while you are in the middle of speaking a voice command (so forget trying to use a prefix event). This will hold true with or without biometrics, or any other method of initiating the switch. The recognition engine needs to be turned off, the profile changed, and then the engine is turned back on. All of this is done automatically by the action VC.SetProfile

I did not know that. however, that can be overridden easily, by saying one off the phrases that has persona interpretation (ie my room), depending on the scenario of the voice recognition;
1. plug in, VC will pass the wave info to plugin set a variable, plugin will generate event with user ... this event will trigger a command that based on the user and the variable for the command will trigger the right action.
2. if the voice recognition is running in parallel to VC will set a variable and stop. then VR will trigger event and again a command will do the branching ...

there could be other scenarios .. those are just from the top of my head ...

finally this might be day dreaming as the VR technology (at least the one I found) might be not reliable at all ...

jitterjames · « **Reply #8 on:** May 29, 2014, 06:49:22 PM »

I am officially lost.

I don't know if it is because I am not understanding you or you are not understanding me but I can't follow any more.

Haddood · « **Reply #9 on:** May 30, 2014, 05:23:55 PM »

No worries James ... will dig this a bit more once I have sometime, and post my findings

Haddood · « **Reply #10 on:** March 25, 2015, 12:07:04 PM »

digging and old topic .... I found that key lemon now support Speaker (voice recognition) as well as face recognition they offer rest api and even have a python wrapper
https://developers.keylemon.com/documentation/reference/libraries/wrappers
https://developers.keylemon.com/documentation/developer/basic/entry_points

they offer free account that will cover VC users

I can't get my head around rest api ... I tried the python wrapper but it needs other module (requests) and VC start to generate errors ...

if anybody can figure this out VC will acquire face and voice recognition

nime5ter · « **Reply #11 on:** March 25, 2015, 02:02:27 PM »

That seems pretty neat, in theory.

If you can come up with a method to upload a voice recording on the fly to a specific, known URL, the basics are easy.

Start by recording a 5-second or more wav for it to use as its base model. (or more than one wav might be better -- I justed used one). Upload the wav file.

Register on their site to get an API key & user name. I always put all that stuff in a map. In my example below the map is called keylemon.

Then you can create a "model" (for a particular person) using a Scrape.Post command similar to my example below.

You'd then have to have a system where a user says their special recognition phrase, which gets recorded and uploaded and compared to the model.

Code: [Select]

<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 2.1.4.2-->
<commandGroup open="True" name="keylemon" enabled="True" prefix="" priority="0" requiredProcess="" description="">
  <command id="689" name="create Naomi model" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="Creates a speaker model for Naomi based on a wav file uploaded to the web. I used a Dropbox link.&#xD;&#xA;&#xD;&#xA;I've created a map called keylemon, in which I store my username, api key, and models etc. when they're created. Replace {M:keylemon.username}&amp;key={M:keylemon.APIkey} with your own API key etc.&#xD;&#xA;&#xD;&#xA;This command will store the model ID number for &quot;Naomi&quot; in my keylemon (created by a successful HTTP post request).">
    <action>
      <cmdType>Scrape.Post</cmdType>
      <params>
        <param>https://api.keylemon.com/api/speaker/model/?user={M:keylemon.username}&amp;key={M:keylemon.APIkey}</param>
        <param>urls=URL PATH FOR YOUR WAVE FILE.wav&amp;name=Naomi</param>
        <param />
        <param />
        <param>application/x-www-form-urlencoded</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>Results.RegExSingle</cmdType>
      <params>
        <param>"model_id":\s"(.*?)".*?"name":\s"(.*?)"</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>Results.MatchToMap</cmdType>
      <params>
        <param>keylemon</param>
        <param>True</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <phrase>create Naomi</phrase>
  </command>
  <command id="690" name="recognize Naomi speaking" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="Analyzes a wav file uploaded somewhere online. You have to give it the target URL. Recognizer returns a &quot;score&quot; out of 100. It's up to you to choose how high a number constitutes a recognized voice.">
    <action>
      <cmdType>Scrape.Post</cmdType>
      <params>
        <param>https://api.keylemon.com/api/speaker/recognize/?user={M:keylemon.username}&amp;key={M:keylemon.APIkey}</param>
        <param>urls=URL PATH TO WHEREVER YOU ARE POSTING YOUR VOICE RECORDING&amp;models={M:keylemon.Naomi}</param>
        <param />
        <param />
        <param>application/x-www-form-urlencoded</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>Results.RegEx</cmdType>
      <params>
        <param>"score":\s(.*?)\}</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <if ifBlockDisabled="False" ifNot="False">
      <ifType>(A)&lt;(B)</ifType>
      <ifParams>{Match.1}&amp;&amp;85</ifParams>
      <then>
        <action>
          <cmdType>TTS.Speak</cmdType>
          <params>
            <param>Voice authorization for Naomi failed. Try again or go away.</param>
          </params>
          <cmdRepeat>1</cmdRepeat>
        </action>
      </then>
      <else>
        <action>
          <cmdType>TTS.Speak</cmdType>
          <params>
            <param>Welcome Naomi. Switching to your profile now.</param>
          </params>
          <cmdRepeat>1</cmdRepeat>
        </action>
        <action>
          <cmdType>OSD.ShowText</cmdType>
          <params>
            <param>Welcome Naomi. Switching to your profile now.</param>
          </params>
          <cmdRepeat>1</cmdRepeat>
        </action>
      </else>
    </if>
    <phrase>Test the voice recognition for Naomi</phrase>
  </command>
</commandGroup>

Haddood · « **Reply #12 on:** March 25, 2015, 04:37:51 PM »

Wow that was fast

now time to play

nime5ter · « **Reply #13 on:** March 25, 2015, 05:07:22 PM »

Play is the operative word.

The http request part is easy, which is all I posted above, as that was what you said was the problem for you. (Pretty much the same as Instapush and all the other examples on the forum, Haddood.

)

The problem is the practical implementation of the uploading your voice recordings all the time and then the question of how good a job they do with their analysis, etc. I compared the same recording to itself using their API, and got a "score" of 99 or something like that. I haven't tried to compare two (or more) different recordings.

I'm a bit skeptical and also tend to be wary of Internet-dependent solutions. Still, a fun experiment!

nime5ter · « **Reply #14 on:** March 26, 2015, 11:21:28 AM »

Last night I had time to expand the proof of concept. This uses VoxWav to record the "voice print" that gets uploaded to Dropbox.

Findings are:

a) You need pretty long voice recordings and pass phrases for this to work decently.
b) Depending on your Internet connection, the first attempt to recognize the voice print may not work and you'll have to re-try.

Note: If you're not using VoxWav with VC and/or you're not a -- let's say -- "advanced" VC user, this project should probably be avoided. ;-)
[Update: James has now added a standard action that you can use instead of VoxWav if you prefer: VcAdvanced.SaveRecoWav]

Four commands here:

1. Not really needed, but VoxWav users can use it to record their voice profile data if they want. This is used to create keylemon "models" (user profiles).

2. Use it to create user models. Sample command needs to be adapted to your user names and file paths etc.

3. "Begin User Authorization". See description in the command. This command automatically triggers the voice data processing command.

4. Voice processing command. Currently just tells you whether a voice was identified or not.

Code: [Select]

<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 2.1.4.2-->
<commandGroup open="True" name="keylemon" enabled="True" prefix="" priority="0" requiredProcess="" description="">
  <command id="705" name="1 - record voice data for model creation" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="Use with VoxWav to record your voice. Replace with your own file path. After, you can use these to create a speaker model (voice profile for a specific user). Recordings must be good quality and at least 4 or 5 seconds long.">
    <action>
      <cmdType>TTS.SpeakSync</cmdType>
      <params>
        <param>Start talking.</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>TcpMic.SaveNext</cmdType>
      <params>
        <param>local file path to Dropbox folder\{1}.wav</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <phrase>Record voice print for</phrase>
    <payloadList>James, Naomi</payloadList>
  </command>
  <command id="689" name="2 - create model for {1}" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="Creates a speaker model for {1} based on a wav file (or files) uploaded to the web. I used a Dropbox link -- saved to my map as {M:keylemon.dropboxURL}. Separate multiple wav file URLs with commas. I've created a map called keylemon, in which I store my username, api key, and models etc.">
    <action>
      <cmdType>Scrape.Post</cmdType>
      <params>
        <param>https://api.keylemon.com/api/speaker/model/?user={M:keylemon.username}&amp;key={M:keylemon.APIkey}</param>
        <param>urls={M:keylemon.dropboxURL}/{1}.wav&amp;name={1}</param>
        <param />
        <param />
        <param>application/x-www-form-urlencoded</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>Results.RegExSingle</cmdType>
      <params>
        <param>"model_id":\s"(.*?)".*?"name":\s"(.*?)"</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>Results.MatchToMap</cmdType>
      <params>
        <param>keylemon</param>
        <param>True</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <phrase>create model for</phrase>
    <payloadList>James, Naomi</payloadList>
  </command>
  <command id="690" name="Begin user authorization" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="Works best if you use a pass phrase that matches one of your model phrases. Recognizer returns a &quot;score&quot; out of 100.">
    <action>
      <cmdType>TTS.SpeakSync</cmdType>
      <params>
        <param>Please say your pass phrase now.</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>TcpMic.SaveNext</cmdType>
      <params>
        <param>LOCAL PATH TO YOUR DROPBOX\Public\keylemon\authUser.wav</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>VC.SetEventTimer</cmdType>
      <params>
        <param>8s</param>
        <param>postvoicedata</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <phrase>Begin user authorization</phrase>
  </command>
  <command id="703" name="process voice data" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="Compares voice data to my 2 user models (James and Naomi). Recognizer returns a &quot;score&quot; out of 100. Proof of concept. Doesn't do anything useful at the moment.">
    <action>
      <cmdType>TTS.SpeakSync</cmdType>
      <params>
        <param>Processing data. Please wait.</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>Scrape.Post</cmdType>
      <params>
        <param>https://api.keylemon.com/api/speaker/recognize/?user={M:keylemon.username}&amp;key={M:keylemon.APIkey}</param>
        <param>urls={M:keylemon.dropboxURL}/authUser.wav&amp;models={M:keylemon.Naomi},{M:keylemon.James}</param>
        <param />
        <param />
        <param>application/x-www-form-urlencoded</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <if ifBlockDisabled="False" ifNot="True">
      <ifType>LastActionSuccess</ifType>
      <ifParams>&amp;&amp;</ifParams>
      <then>
        <action>
          <cmdType>TTS.Speak</cmdType>
          <params>
            <param>Voice processing failed.</param>
          </params>
          <cmdRepeat>1</cmdRepeat>
        </action>
        <action>
          <cmdType>OSD.ShowText</cmdType>
          <params>
            <param>Processing failed. {CR} If you think your recording was good, say: "Re-try voice authorization"</param>
          </params>
          <cmdRepeat>1</cmdRepeat>
        </action>
        <action>
          <cmdType>VC.StopMacro</cmdType>
          <params />
          <cmdRepeat>1</cmdRepeat>
        </action>
      </then>
      <else />
    </if>
    <action>
      <cmdType>Results.RegEx</cmdType>
      <params>
        <param>"name":\s"(.*?)",\s"score":\s(.*?)\}</param>
        <param> - </param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <if ifBlockDisabled="False" ifNot="False">
      <ifType>(A)&lt;(B)</ifType>
      <ifParams>85&amp;&amp;{Match.1.2}</ifParams>
      <then>
        <action>
          <cmdType>TTS.Speak</cmdType>
          <params>
            <param>Match found. Welcome {Match.1.1}. Switching to profile for {Match.1.1}</param>
          </params>
          <cmdRepeat>1</cmdRepeat>
        </action>
        <action>
          <cmdType>VC.SetProfile</cmdType>
          <params>
            <param>{Match.1.1}</param>
          </params>
          <cmdRepeat>0</cmdRepeat>
        </action>
      </then>
      <else />
    </if>
    <if ifBlockDisabled="False" ifNot="False">
      <ifType>(A)&lt;(B)</ifType>
      <ifParams>85&amp;&amp;{Match.2.2}</ifParams>
      <then>
        <action>
          <cmdType>TTS.Speak</cmdType>
          <params>
            <param>Match found. Welcome {Match.2.1}. </param>
          </params>
          <cmdRepeat>1</cmdRepeat>
        </action>
        <action>
          <cmdType>VC.SetProfile</cmdType>
          <params>
            <param>{Match.2.1}</param>
          </params>
          <cmdRepeat>0</cmdRepeat>
        </action>
      </then>
      <else />
    </if>
    <action>
      <cmdType>OSD.ShowText</cmdType>
      <params>
        <param>Results:</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>OSD.AddText</cmdType>
      <params>
        <param>{Match.{i}.1}: {Match.{i}.2}</param>
      </params>
      <cmdRepeat>{#M}</cmdRepeat>
    </action>
    <action>
      <cmdType>File.Delete</cmdType>
      <params>
        <param>LOCAL PATH TO YOUR DROPBOX\Public\keylemon\authUser.wav</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <event>postvoicedata</event>
    <phrase>Re-try voice authorization</phrase>
  </command>
</commandGroup>

Users will need to get their own username and API key from the keylemon website.

POST SCRIPT: Although this will work for those who don't mind an Internet-reliant system, and has a "fun factor" to it, it has no genuine advantage that I can see over simply having a dedicated pass phrase for each user -- without using biometrics (as I described near the beginning of this thread). The simple pass phrase ("Call me Ishmael", "My name is Luka") does not require Internet or waiting for said Internet service or extra levels of potential failure.

Author Topic: Voice recognition (biometrics) (Read 7466 times)

Haddood

Voice recognition (biometrics)

jitterjames

Re: Voice recognition (biometrics)

Haddood

Re: Voice recognition (biometrics)

jitterjames

Re: Voice recognition (biometrics)

Haddood

Re: Voice recognition (biometrics)

nime5ter

Re: Voice recognition (biometrics)

jitterjames

Re: Voice recognition (biometrics)

Haddood

Re: Voice recognition (biometrics)

jitterjames

Re: Voice recognition (biometrics)

Haddood

Re: Voice recognition (biometrics)

Haddood

Re: Voice recognition (biometrics)

nime5ter

Re: Voice recognition (biometrics)

Haddood

Re: Voice recognition (biometrics)

nime5ter

Re: Voice recognition (biometrics)

nime5ter

Re: Voice recognition (biometrics)