(String: {%- set hs_blog_post_body -%} {%- set in_blog_post_body = true -%} <span id="hs_cos_wrapper_post_body" class="hs_cos_wrapper hs_cos_wrapper_meta_field hs_cos_wrapper_type_rich_text" style="" data-hs-cos-general-type="meta_field" data-hs-cos-type="rich_text"> <div class="blog-post__lead h3"> <h2>A comparison of text to speech solutions</h2> <p>There are many text to speech solutions on the market, starting from Google Cloud Text To Speech, Microsoft Azure Text To Speech, Amazon AWS Polly and finishing with natively implemented solutions for Android and iOS devices.</p> </div></span>)

Comparison of Text to Speech Solutions for React Native

Photo of Daniel Idaszak

Daniel Idaszak

Updated Dec 12, 2022 • 10 min read

A comparison of text to speech solutions

There are many text to speech solutions on the market, starting from Google Cloud Text To Speech, Microsoft Azure Text To Speech, Amazon AWS Polly and finishing with natively implemented solutions for Android and iOS devices.

While the last one seems easier to implement, is reliable, and offers possibilities like using it without an internet connection, the “bigger” solutions are backed by machine learning and provide higher quality voices. The biggest downside is support for React Native - while all of them have official iOS and Android SDKs, it’s really hard to find a library for React Native.
In this article, we present a comparison of native solutions for normal React Native apps, Expo ones, and the Rest API of Google Speech To Text.

React-Native-TTS library

The simplest way to implement Text to Speech functionality in React Native apps is by using the React-Native-TTS library. It is very easy to set up and the library is ready to use with auto-linking from React-Native version 0.60+.
What is the biggest advantage of this solution? Native text to speech engines on both platforms allow you to easily use this feature in your app, and, more importantly, enable you to use some of the voices without a network connection.

The downside of this solution is the lack of consistency. Android and iOS have different text to speech engines, so the selection of voices is different on each platform. This could be an issue if you want to create a chatbot that speaks its own voice.

You can choose from 281 voices in different languages on Android and 38 provided by Apple on iOS.

While configuring your voice you can set parameters like:

  • Language
    Tts.setDefaultLanguage('en-IE');
  • Voice
    Tts.setDefaultVoice('com.apple.ttsbundle.Moira-compact');
  • Speech rate
    Tts.setDefaultRate(0.6);
  • Pitch
    Tts.setDefaultPitch(1.5);

On Android there are also Pan and Volume:

Tts.speak('Hello, world!', { androidParams: { KEY_PARAM_PAN: -1, KEY_PARAM_VOLUME: 0.5 } });

Setup is as easy as typing:

yarn add react-native-tts

and the library should link itself while running pod install.

Let’s create a simple component with speech to text:


import React from 'react'
import { Button } from 'react-native'
import Tts from 'react-native-tts'
 
Tts.setDefaultLanguage('en-GB')
Tts.setDefaultVoice('com.apple.ttsbundle.Daniel-compact')
 
const NativeSpeech = () => (
 <Button
   title="Speak!"
   onPress={() => Tts.speak('Hello World!')}
 />
)
 
export default NativeSpeech

Speech React Native Expo module

If you are using expo, React-Native-TTS library is not for you. Instead, you should use Expo-speech SDK. You just need to run:

expo install expo-speech

If you want to use this library in a non-expo project, you can use the unimodules package. Just follow the unimodules documentation, run expo install expo-speech, and then run pod install.

The speech library from expo uses the same native text to speech engine as the React-Native-TTS library, so you can use the same voices you would use with this library and the configuration process is similar. For more info, head to the expo-speech documentation.
Let’s create a simple component with the expo speaking service:


import React from 'react'
import { Button } from 'react-native'
import * as Speech from 'expo-speech'
 
const NativeSpeech = () => (
 <Button title="Speak!" onPress={() => Speech.speak('Hello World!')} />
);
 
export default NativeSpeech

Google Cloud Speech To Text

Google Cloud Speech to Text is a solution without a client library for React Native. We can use its rest API to request base64 encoded mp3 with our text. The biggest advantage of this service is the ability to choose one voice that would be available on both platforms. It’s also highly configurable. You can choose from sets of standard voices, but you can also choose from a library of higher quality voices.

Head to the Google Text to Speech website. You can test all of the available voices in the panel below.

Screenshot 2019-12-17 at 14.32.07

In order to start using text to speech, hit the “Get started for free” button and create an account.

If you are ready, the first thing to do before starting would be creating an API key. Of course, you can create a simple API key to test text to speech, but if you want to use it in a real app, you should restrict this key as described in this blog post.

Open the Google Cloud Console and from the menu choose “APIs & services”, then head to the credentials section. Click “Create Credentials” and use “API key”. The key will be created; click the “Restrict key” button. You will see a screen like this:

Screenshot 2019-12-17 at 14.39.28

Choose a name for your key. We will be creating one key for Android and one for iOS apps, so you can name them “API key - iOS” and “API key - Android”. Then go to application restrictions:

Screenshot 2019-12-17 at 14.40.03

You can restrict your API key for your app bundle. It will prevent attackers from stealing your keys and using them outside of your app. You have to prepare two different keys, one for Android and one for iOS. If you want, you can also restrict those keys to Text to Speech only:

Screenshot 2019-12-17 at 14.42.22

Now you can safely store your keys in your ENV file by using the React-Native-Config library and then fetch API key according to the platform.

Google’s speech to text is a REST API, There is no client library for React Native. That’s why we need to fetch base64 encoded voice to our app and then play It.

A huge advantage of using this solution is better quality voices backed by machine learning. You can choose from a huge list of voices for every language. Every language has a few voices, both male and female, but also standard ones and WaveNet voices, whose pricing is a little bit higher because of better voice quality. You can check them all out on the Reference list of Google Text to Speech voices. How to set the voice of your choice? It’s as simple as adding the setting to the body of an API call:


voice: {
     languageCode: 'en-US',
     name: 'en-US-Standard-B',
     ssmlGender: 'FEMALE'
   }

Let’s create an API call to fetch this data.
Our goal is to fetch data from the https://texttospeech.googleapis.com/v1/text:synthesize endpoint, so first, we can create a function returning an object with our headers and body:


const createRequest = text => ({
 headers : {
   'Content-Type': 'application/json'
 },
 body: {
   input: {
     text
   },
   voice: {
     languageCode: 'en-US',
     name: 'en-US-Standard-B',
     ssmlGender: 'FEMALE'
   },
   audioConfig: {
     audioEncoding: 'MP3',
   }
 },
 method: 'POST'
})

The createRequest function has some settings for voice and config for audio type, but the most important thing is the “text” argument, which is our text we want Google to say. Now we can create a function which will fetch our mp3.


import Config from 'react-native-config'
import { Platform } from 'react-native'
 
const speech = async (text) => {
 const key = Platform.OS === 'ios' ? Config.KEY_IOS : Config.KEY_ANDROID
 const address = `https://texttospeech.googleapis.com/v1/text:synthesize?key=${key}`
 const payload = createRequest(text)
 try {
   const response = await fetch(`${address}`, payload)
   const result = await response.json()
   console.log(result)
 } catch (err) {
   console.warn(err)
 }
}

As you can see, we are using react-native-config to fetch the API key depending on the current platform, using the createRequest function with the passed text, and then fetching from the Google Text to Speech API. It should return base64 encoded data to the console, just run:

speech(‘Hello world!’)

If we want to decode base64 and play it on our device, we need to use the react-native-fs library, so we can create createFile function:


const RNFS = require('react-native-fs')
 
const createFile = async (path, data) => {
 try {
   return await RNFS.writeFile(path, data, 'base64')
 } catch (err) {
   console.warn(err)
 }
 
 return null
}

As the first argument, we need to pass the path and our base64 encoded data. It will save It in our device memory as an mp3, which can be played by the React-Native-Sound library. So right now we need a function to play our sound:


const Sound = require('react-native-sound')
 
const playMusic = (music) => {
 const speech = new Sound(music, '', (error) => {
   if (error) {
     console.warn('failed to load the sound', error)
 
     return null
   }
   speech.play((success) => {
     if (!success) {
       console.warn('playback failed due to audio decoding errors')
     }
   })
 
   return null
 })
}

We need to pass the previously created path as an argument to the playMusic function.
Now we can connect our code and try our new speaking service:


const speech = async (text) => {
 const key = Platform.OS === 'ios' ? Config.KEY_IOS : Config.KEY_ANDROID
 const address = `https://texttospeech.googleapis.com/v1/text:synthesize?key=${key}`
 const payload = createRequest(text)
 const path = `${RNFS.DocumentDirectoryPath}/voice.mp3`
 try {
   const response = await fetch(`${address}`, payload)
   const result = await response.json()
   console.log(result)
   await createFile(path, result.audioContent)
   playMusic(path)
 } catch (err) {
   console.warn(err)
 }
}

Summary

In this article, we compared speech to text solutions - native ones and the Google Cloud Speech To Text. We also provided some information about using this service in Expo-based projects and native projects.


When choosing a speech to text solution we have to decide on the most important features - whether it’s using this service offline, or maybe having the same voice on all platforms? It’s also possible to use both solutions, with the native one as a fallback, so feel free to experiment with the code!

Photo by Dan Farrell on Unsplash

Photo of Daniel Idaszak

More posts by this author

Daniel Idaszak

How to build products fast?  We've just answered the question in our Digital Acceleration Editorial  Sign up to get access

We're Netguru!

At Netguru we specialize in designing, building, shipping and scaling beautiful, usable products with blazing-fast efficiency
Let's talk business!

Trusted by: