So, I just have turned around the mid point of my capstone project log, it is really about to time for implementing the main function of my application, which is ‘STT’.
STT is the short for ‘Speech-To-Text’ and as you can understand directly, it is the package to translate users’ input voice data to the text form. I just have tried to implement this using Flutter package, but the quality was not enough to use for the final application version at all. Therefore, I had some search of the other STT api, and I am going to arrange the results with each product’s pros and cons.
The first one is ‘Deepgram Speech-to-Text’ api. According to the report from Opus Research, ‘2022 State of Voice Technology’, this api is selected as the best STT api so far. It has the most accurate voice analysis technologies which is running in real time. The accurate percentage of this service is about 90%, and it also has the fastest ASR. However, it has only, but a big weakness that it does not support various language not yet.
The second one is ‘Amazon Transcribe.’ As many programs Amazon already have published do, based on its global big data, it provides high quality STT technology. This service provides various language, and it is really easy to be integrated to the alpha version source code if it is based on the other AWS eco system. The one thing it is not that good is the highest price. For the general purpose, it takes $1.44 per one hour.
The last one is ‘Google Speech-to-Text.’ Many pros are very similar to AWS. It has its own brandname, and also easy to integrate with the application based on Google ecosystem. However, it has poor accuracy and slow speed with high cost comparing with above two options.
So far, I almost have decided to choose AWS for my project but the searching was really interesting and it was worth.
Most of contents are referenced by the article on Deepgram blog. I will leave the website address here: https://blog.deepgram.com/best-speech-to-text-apis/
Leave a Reply