VoxCommando

VoxNastics (User Guides and Mods) => XML Exchange => Topic started by: nime5ter on April 03, 2016, 12:55:36 PM

Title: Free cloud-based TTS / Облако основе TTS на английском или русск
Post by: nime5ter on April 03, 2016, 12:55:36 PM
Yandex offers a free (or, free for now) cloud-based TTS API. It is only for Russian or English. (The documentation says Russian only, but it works for me in English.)

Of course, the TTS will not work unless you're connected to the internet.

1. You need to register by email in order to get your own API key. https://tech.yandex.com/speechkit/cloud/

2. You will need to replace "{M:API.ydx.speechkit.key}" in the XML below with your own API key.

3. Note: For the Russian TTS to work, you will need to have VC 2.2.1.7 or later installed.

This is because Russian-language text needs to be properly URL encoded. In 2.2.1.7 (http://voxcommando.com/mediawiki/index.php?title=ChangeLog#Version_2.2.1.7), James has added a new Tools.Encode.URI action.

Code: [Select]
<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 2.2.1.7-->
<commandGroup open="True" name="yandex" enabled="True" prefix="" priority="0" requiredProcess="" description="">
  <command id="304" name="++Speak to me Yandex (use an action like this example in any command that requires TTS)" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="">
    <action>
      <cmdType>VC.TriggerEvent</cmdType>
      <params>
        <param>Yandex.TTS</param>
        <param>Greetings, earth being.</param>
        <param>en</param>
        <param>jane</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>VC.Pause</cmdType>
      <params>
        <param>2000</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>VC.TriggerEvent</cmdType>
      <params>
        <param>Yandex.TTS</param>
        <param>Я плохо говорю по-русски</param>
        <param>ru</param>
        <param>zahar</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <phrase>Speak to me Yandex</phrase>
  </command>
  <command id="159" name="Yandex TTS" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="{1} phrase you want spoken (Russian or English only)&#xD;&#xA;{2} --&gt; lang code options are ru-RU, en-GB. Officially, only Russian is supported, but English seems to work.&#xD;&#xA;{3} --&gt; speaker options are: &#xD;&#xA;female voices: jane or omazh&#xD;&#xA;male voices: zahar or ermil&#xD;&#xA;You will need your own API key to replace {M:API.ydx.speechkit.key}. See: https://tech.yandex.com/speechkit/cloud/doc/dg/concepts/speechkit-dg-tts-docpage/">
    <action>
      <cmdType>Tools.Encode.URI</cmdType>
      <params>
        <param>{1}</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>Sound.PlayStream</cmdType>
      <params>
        <param>https://tts.voicetech.yandex.net/generate?text={LastResult}&amp;format=mp3&amp;lang={2}&amp;speaker={3}&amp;key={M:API.ydx.speechkit.key}&amp;emotion=mixed</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <event>Yandex.TTS</event>
  </command>
</commandGroup>

To customize the TTS voice, see the documentation here: https://tech.yandex.com/speechkit/cloud/doc/dg/concepts/speechkit-dg-tts-docpage/
Title: Re: Free cloud-based TTS / Облако основе TTS на английском или русс
Post by: nime5ter on April 03, 2016, 01:27:10 PM
In case it's not self-evident, to use this as your main TTS solution, rather than using TTS.Speak actions in your commands, you would call on the Yandex TTS command using VC.TriggerEvent actions (as I demonstrate in my XML example above).
Title: Re: Free cloud-based TTS / Облако основе TTS на английском или русс&#
Post by: Ginto on April 23, 2016, 09:42:47 PM
TTS это конечно приятно, но мне гораздо интереснее, как использовать распознавание текста. И в гугл и в яндекс нужно передавать wav-файл, а я не могу придумать, как это сделать с помощью вокса. Включать запись сторонней программой, мониторить папку Watcher, и надо ещё отправить этот файл пост-запросом.. Слишком наворочено выходит, а чем сложнее система, тем легче ей выйти из строя. Форма с webkitSpeechRecognition наверно в робобраузере не будет работать.. Проще всего прописать клик мышки на микрофон в гугле, затем копировать текст через элемент? Все эти костыли с использованием GUI мне совсем не нравятся, должен же быть надежнее способ.