Author Topic: Voice recognition (biometrics)  (Read 7469 times)

0 Members and 2 Guests are viewing this topic.

Haddood

  • $upporter
  • Hero Member
  • *****
  • Posts: 688
  • Karma: 22
    • View Profile
Re: Voice recognition (biometrics)
« Reply #15 on: March 26, 2015, 03:58:36 PM »
I am having trouble sending the files through dropbox which made me think of using the webserver through TCP plugin ... that will even help avoiding the upload time needed by dropbox before continuing the command (won't affect the total time)
for sure router will need to be set up for port forwarding... then scraping the external IP is piece of cake or using DDNs ...

I created a folder in: C:\Extensions\Vox Commando\plugins\TCP\html\keyLemon ... when I put an html page I can access it with explorer. However, when I try to access the wav file I get error 404 not found... is the web server limited to html or something ?

http://127.0.0.1:8088/html/KeyLemon/index.html ... works
http://127.0.0.1:8088/html/KeyLemon/Haddood.wav ... I get error 404

meanwhile for those like me who do not have voxwav, one can use windows Sound Recorder (see below)
SoundRecorder /FILE filename.filetype /DURATION hhhh:mm:s

for the time being I will be triggering the recording with Prefix Start and we will see where that goes.

as well uploading multiple files to the same model should increase the accuracy ...

the way I see this can work in a very nice way ... is if James can add an option to save "Last Heard" as a wav file .... with folder watcher approach, VC can trigger the recognition precess once the file is saved ...

and to make it even better if the phrase length is less than 8 seconds, VC can attach duplicate the file x of times to reach an 8 second ... at the end the system do not require the same phrase to be said ... let's say VC hears : "VC lights of" and that is 1.5 seconds ... VC  would save a wav "VC lights of VC lights of VC lights of VC lights of VC lights of VC lights of" and that is 9 seconds ... of course that depends on James if he is onboard to experiment with that ...

once a match found ... VC can switch the SR profile to the right user ... or at least personalise responses

----- command recording using windows sound recorder

Code: [Select]
<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 2.1.4.2-->
<command id="705" name="1 - record voice data for model creation" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="Use with VoxWav to record your voice. Replace with your own file path. After, you can use these to create a speaker model (voice profile for a specific user). &#xD;&#xA;Recordings must be good quality and at least 4 or 5 seconds long.">
  <action>
    <cmdType>TTS.SpeakSync</cmdType>
    <params>
      <param>Start talking.</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>Launch.Hidden</cmdType>
    <params>
      <param>c:\windows\system32\SoundRecorder.exe</param>
      <param> /FILE {M:Credentials.KeyLemonFolder}\{1}.wav /DURATION 0000:00:8</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <phrase>Record voice print for</phrase>
  <payloadFromXML phraseOnly="True" use2partPhrase="False" phraseConnector="by" Phrase2wildcard="anyone" optional="False">payloads\Users.xml</payloadFromXML>
</command>
« Last Edit: March 27, 2015, 12:36:49 AM by Haddood »
When Voice command gets tough, use hand gestures

nime5ter

  • Administrator
  • Hero Member
  • *****
  • Posts: 1999
  • Karma: 61
    • View Profile
    • Getting Started with VoxCommando
Re: Voice recognition (biometrics)
« Reply #16 on: March 26, 2015, 05:23:55 PM »
I created a folder in: C:\Extensions\Vox Commando\plugins\TCP\html\keyLemon ... when I put an html page I can access it with explorer. However, when I try to access the wav file I get error 404 not found... is the web server limited to html or something ?

Not limited to html per se (e.g. image files will work), however it *is* described as a "simple web server". :)

Interesting discovery.

Quote
and to make it even better if the phrase length is less than 8 seconds, VC can attach duplicate the file x of times to reach an 8 second ... at the end the system do not require the same phrase to be said ... let's say VC hears : "VC lights of" and that is 1.5 seconds ... VC  would save a wav "VC lights of VC lights of VC lights of VC lights of VC lights of VC lights of" and that is 9 seconds ...

That will not work. The issue is on the keylemon side. They require that *each wav file* analysed meets certain basic standards. If you send them a bunch of little wavs they will simply reject them.

It doesn't have to be 9 seconds though. I had decent luck with 4 to 5 seconds. But that's actually quite a mouthful.

It's nice to dream, but in the end we have to work with the cards we're given. If users really want to use this web service, the only semi-practical way is to settle on a particular pass phrase that meets their requirements.

... With regard to saving your voice command as a wav file even without VoxWav, I believe that James has already made your dream come true on that one (whenever the next release comes out). ;-)


TIPS: POST VC VERSION #. Explain what you want VC to do. Say what you've tried & what happened, or post a video demo. Attach VC log. Link to instructions followed.  Post your command (xml)

Haddood

  • $upporter
  • Hero Member
  • *****
  • Posts: 688
  • Karma: 22
    • View Profile
Re: Voice recognition (biometrics)
« Reply #17 on: March 26, 2015, 11:33:02 PM »
Not limited to html per se (e.g. image files will work), however it *is* described as a "simple web server". :)

Interesting discovery.
Does that means I should look for another web server? Is there a place were I can find specs and docs for it?
I hope I will not need to install IIS or a full web server, it will be a kill for my modest machine

That will not work. The issue is on the keylemon side. They require that *each wav file* analysed meets certain basic standards. If you send them a bunch of little wavs they will simply reject them.

It doesn't have to be 9 seconds though. I had decent luck with 4 to 5 seconds. But that's actually quite a mouthful.

I meant that the words are repeated inside one file by relating the wav multiple times, like copy paste at the end, then save file

... With regard to saving your voice command as a wav file even without VoxWav, I believe that James has already made your dream come true on that one (whenever the next release comes out). ;-)
::yikes I can't wait ... Now I will be sleep deprived   ::saddest
I just hope that saving the wave will generate an event, with length in seconds as payload  ::)

You guys are doing fantastic job  :clap :clap


It's nice to dream, but in the end we have to work with the cards we're given. If users really want to use this web service, the only semi-practical way is to settle on a particular pass phrase that meets their requirements.

It is dreams that got us out of the caves (or at least that is what I tell my students) ... ;)

Anyway, special thanks to you ... Your feedback and sharing solutions making programming VC even more fun than using it
« Last Edit: March 26, 2015, 11:39:20 PM by Haddood »
When Voice command gets tough, use hand gestures

nime5ter

  • Administrator
  • Hero Member
  • *****
  • Posts: 1999
  • Karma: 61
    • View Profile
    • Getting Started with VoxCommando
Re: Voice recognition (biometrics)
« Reply #18 on: March 27, 2015, 09:24:01 AM »
Does that means I should look for another web server? Is there a place were I can find specs and docs for it?

The simplest solution would be for you to use curl, especially since you're already familiar with it.

According to the keylemon documentation (https://developers.keylemon.com/documentation/developer/rest_api/model_creation_speaker), both when creating a speaker model and for speaker verification, there are two ways you can provide them with the wav data (see the "urls" parameter):

Quote
Parameters

Name   Description
Required   user 1   Your API username
key 1   Your API key
urls   Comma separated list of public URLs of audio files to use for the recognition. Additionnaly, audio files can be uploaded as multipart attachment of the POST request.
models   A comma separated list of models ids or groups ids
identities   A comma separated list of identities ids
Optional   max_result   The maximum number of results (default 10)
async   Boolean. Perform recognition in asynchronous mode (default false)
mean   Boolean. Compute the arithmetic mean of the results, if multiple samples are tested (default false)

In my example above, I'm using the public URL option, but you can instead try uploading your wav files as a multipart attachment using curl.

I believe you already know how to structure post requests in curl from your experience with Instapush? To post multipart data, see the documentation for curl's -F option. http://curl.haxx.se/docs/manpage.html
« Last Edit: March 27, 2015, 09:31:22 AM by nime5ter »
TIPS: POST VC VERSION #. Explain what you want VC to do. Say what you've tried & what happened, or post a video demo. Attach VC log. Link to instructions followed.  Post your command (xml)