Quick Start

Follow the TTS Implementation Checklist to get started, and then read that topic and this one while you are downloading the necessary files.

Introduction

CopiaFacts supports Microsoft Text-to-Speech (TTS) for voice prompts and messages.  There are currently two ways to implement this:

Microsoft Speech API (SAPI) 5.x for TTS is normally installed on Vista, Windows 7 and Server 2008.  It comes with one set of voice files ('Microsoft Anna') and another file is downloadable for Chinese language applications.  Third-party voices are also available.  Although some versions of SAPI may be supported on earlier operating systems, the quality is unlikely to prove acceptable.

Microsoft Speech Server 11.0 is available for download from http://www.microsoft.com/download/en/details.aspx?id=27225. IMPORTANT: download the 32-bit (x86) runtime package even if you have a 64-bit operating system.  We have tested this under Vista, Windows 7 and Server 2008.  Microsft also document that it can be deployed on Server 2003.

Note that CopiaFacts does not currently support Microsoft Speech Server 10 or earlier. Note that version 11 is incompatible with earlier Speech Server releases: we suggest checking in Control Panel / Programs and Features that you do not have any other Speech Server components or voices installed before installing version 11.

After downloading and installing the Speech Server, you must also download one or more voice files.  The standard SAPI voice (Microsoft Anna) does not work with the Microsoft Speech Server.

Download TTS voice files from http://www.microsoft.com/download/en/details.aspx?id=27224. IMPORTANT: download the TTS voice files from the lower part of the list, NOT the Speech Recognition files from the top part of the list.  There are 27 different voices available, covering a variety of dialects and languages.

We strongly recommend installing the Microsoft Speech Server in preference to the SAPI included with the operating system.  While this is necessary to obtain voices other than US English, you will find that even for US English, Microsoft Anna (provided with SAPI) mispronounces several words, and that Microsoft Helen (Speech Server en-US) does a much better job.

Because the TTS operation creates a temporary WAV file, this feature cannot be used with older boards which do not support WAV file formats.

Configuration

CopiaFacts supports SAPI with CF8MSSAPI.DLL and Speech Server with CF8MSSPEECH.DLL.  In your configuration file, use ONE of the following $tts_dll commands:

$tts_dll * @PFC\CF8MSSAPI.DLL

$tts_dll * @PFC\CF8MSSPEECH.DLL

The above command can also take parameters to specify the default voice and the output WAV format, as described under $tts_dll.  Without a default voice parameter, TTS operations will use the first or only voice found.  For Speech Server, the voice can be overridden by an SSML element in the text to be spoken.

Note that the two DLLs named above may have the same size and build time, but they are different.

Using TTS for prompts and custom messages

Two new infobox commands, $tts_text and $tts_file are provided to specify text to be spoken.  One of these commands is used (and replaces $image_desc) in an infobox such as a $type question infobox to specify the prompt which is to be played.  The TTS operation always creates a new temporary WAV file in the Windows temp folder and deletes it after it has been played.  If an asynchronous play is selected, the file is not deleted until the operation completes or is canceled.

The $tts_text command should be used to specify short phrases which are to be spoken in a prompt.  It is possible to include a small number of embedded XML markup elements, but either the string must start with a '<' or you must specify the IsXML keyword in the TTS_OPTIONS variable, to cause these to be interpreted.  For details, see the command description.

The $tts_file command should normally reference a .XML file although for simple messages the file may contain plain text only, with no markup.  You should normally use SSML markup in this file, which has a somewhat richer syntax than the simple XML, and allows fixed WAV segments to be embedded.  For details, see the command description.  The $image_desc command may also be used to name an XML file.

If an XML file is provided with the same pathname root as an infobox file, then as with WAV or VOX files, it can replace the $image_desc file and the $tts_file command.

In TTS text and TTS files, CopiaFacts variables are expanded using the defined e-mail variable-expansion character (default ` accent-grave).  In TTS files, you can also use the CopiaFacts Conditional Text feature to conditionally include phrases, enabling this with the TTS keyword in the CONDITIONAL_TEXT variable.  You should bear in mind that the XML syntax must remain valid when the conditionals are applied.

Using TTS for digits and amounts playback

An XML file may be specified as the digit or amounts playback file, or instead of the multi-segment file in a $play_var operation.  In this case the value to be spoken is placed in the TTS_VALUE variable for the duration of the play, and the TTS_PREFIX and TTS_SUFFIX values contain the number from the specified prefix and suffix respectively.  Copia provides a sample XML file for each of digits and amounts, but if you have created custom multi-segment files, or used other provided samples, it will be necessary to create new files if you wish to substitute a TTS operation.

Sample DIGITS.XML to play image and phone numbers

Using TTS for standard voice prompts

We do not recommend using TTS for standard voice prompts, because it would involve the overhead of converting to a WAV file each time the message is played.  However if you  wish to have the voice used for standard voice prompts matching that which you use for custom prompts and messages, we provide a set of XML files which you can use with FFTESTTS to create prompts in the voice you have selected.  These can also be translated into other languages before creating WAV files from them.

Testing Prompts

The FFTESTTS command-line program is provided to prepare WAV files in advance, and optionally to listen to spoken prompts.  This can take either an XML filename or a text string and speak it using the default voice or the voice specified in FAXFACTS.CFG.  Obviously, a machine with sound output is required to hear the spoken text, but this is not needed to create WAV files, either in advance or at call time.

You can also use PHONESIM to test your IVR applications which use Text to Speech.  Pre-recorded WAV file from FFTESTTS and dynamically-created files from XML are spoken in PHONESIM like any other WAV file.

Enabling CF8MSSAPI or CF8MSSPEECH in FFTRACE will provide information about the conversion and about any errors encountered.

Sample Application

The IVR application OPTOUT is an example of a complete application written to use TTS.