Author Topic: Payload dictation for SP version  (Read 1205 times)

0 Members and 1 Guest are viewing this topic.

marcusvdt

  • Sr. Member
  • ****
  • Posts: 152
  • Karma: 6
  • Researching
    • View Profile
Payload dictation for SP version
« on: April 10, 2015, 06:42:00 PM »
Hi!
 :o I just discovered the SP version of Voice Commando does not allow open dictation payloads, right?
Is it possible to enable the open dictation on it so can interact with a google search for example?

For a Google search, I will always need to say something that I can't predict what it is, for example, like I do when I use Samsung's VR engine on my phone, where I can say:
"Google notícias de santos"
In this specific example, my phone will use an online engine to convert my speech to text and it will search for it.

I'm wondering if I can have more flexibility with VC, while interacting with applications. For example, I'd like to be able to search for a video from my personal videos using open dictation to enter the name of the video to be searched, not a predefined payload xml.

I was briefly reading the docs from MS about the SP 11 and it seems it is possible. Of course this is beyond my current knowledge of actually how it is done in details, so sorry if I'm asking something too much difficult to achieve.


This is a feature very desired for my plans with VC in the near future. For example, I should be able to dictate a new calendar entry and the like.

Please correct me and clarify if I'm talking complete bullshit.  :bonk

Thanks.
« Last Edit: April 10, 2015, 06:51:02 PM by marcusvdt »

jitterjames

  • Administrator
  • Hero Member
  • *****
  • Posts: 7715
  • Karma: 116
    • View Profile
    • VoxCommando
Re: Payload dictation for SP version
« Reply #1 on: April 10, 2015, 07:30:51 PM »
It is simply not possible, no matter how much we would like it to be.  That is why we did not implement it.

Where in those documents that you refer to does it say that dictation is supported?  If I search on the page that you link to for dictation nothing is found.

Even if it were supported it is unlikely that it would yield very good results.  The SP engine uses a lower fidelity of sound and does not support training.

marcusvdt

  • Sr. Member
  • ****
  • Posts: 152
  • Karma: 6
  • Researching
    • View Profile
Re: Payload dictation for SP version
« Reply #2 on: April 10, 2015, 08:13:21 PM »
No, it does not say dictation literally. For lack of knowledge of how VR engines actually work, I may have mistakenly understood that the engine allows conversion from whatever you speech to text, but seems I'm completely lost as to how VR actually works in any software.

On this page they say there is a conversion from speech to text, and this where I based my original question.
https://msdn.microsoft.com/en-us/library/hh378337(v=office.14).aspx
Quote
A speech recognition engine (or speech recognizer) takes an audio stream as input and turns it into a text transcription.


marcusvdt

  • Sr. Member
  • ****
  • Posts: 152
  • Karma: 6
  • Researching
    • View Profile
Re: Payload dictation for SP version
« Reply #3 on: April 10, 2015, 08:34:14 PM »
Reading more now, and getting a better idea.
https://msdn.microsoft.com/en-us/library/hh378458%28v=office.14%29.aspx

I thought VR engines were able to construct words from scratch, based on phonetic sound combined with the language locale that is set. That way I would be able to dictate a word that does not exist in my language just by saying the phonetics of each part of the word...
But it seems it needs a grammar to actually try to match the phonetic sounds to predicted words. So, without a grammar containing all the possible words that you may want to say, it won't have anything to match and hence will just ignore unknown sounds.
Then, if this is the only way a VR engine is able to work (which now makes sense to me), how in the hell some of the so called assistants can identify what I'm saying for a web search or for creating a new note on my phone?
Do they have huge collection of all the possible grammars available for a spoken language? Maybe they are self updating the grammar with information from users?