Speech recognition, powered by artificial intelligence, uses grammar and language structure to better understand natural speech and return the most accurate transcripts possible. Watson continuously learns and retroactively updates the transcription as more speech is heard.

Who can benefit from Callnote’s video call recorder with speech-to-text transcription:

  • Writers, journalists and bloggers
  • Educators and students
  • Legal practices
  • Financial services
  • Healthcare
  • Media

Could you also benefit from recording, editing and transcribing your video calls?

Audio and video is becoming an essential online communication tool. One problem though is that video content isn’t searchable. But creating transcriptions of your content with Callnote can be an SEO gold-mine. Most search engines will only pick up your podcast’s title, description, and tags, but Callnote with Watson gives searchers access to all of the information in your video.

Having a transcript gives you more than one way to disseminate your material. Maybe you want to turn your Google Hangouts podcast into a blog post, or a SlideShare, or even an infographic. You can also add hyperlinks to your transcript and post it on your blog or site.

How does it work?

  1. Select a recorded audio or video file
  2. Choose edit
  3. Click transcribe

and Callnote with Watson will complete your transcript in less time than the length of the original recording.
The final transcript captures speech from all participants. By comparing the speaker labels with the timestamps; however, you can reassemble the conversation as it actually occurred.

Please note:

  • You should have Premium or Pro version to get access to this feature.
  • Free Trial:
    Each user will get 60 minutes of free transcription one-time.

    Access to all features.
  • Payment:
    $0.04/minute for each minute above 60.

    No minimum.

    No additional charges.

    Refunds will not be provided for any subscription.
  • Turnaround:
    Transcripts back in minutes.

    Shorter files delivered faster.

    Accuracy of transcription depends on the quality of your audio.
    We speak imperfectly: we swallow words, stumble and interrupt; many of us speak with heavy accents. Bad audio quality such as distant microphones, background noise, music, and room echo can make it hard to decipher the content. Some proper names and technical terms won’t be recognized.


Number of speakers 2 channels
Length of audio Files less than 100Mb.
For bigger files the solution is to manually divide the file into smaller pieces
Supported languages Brazilian Portuguese, French, Japanese, Mandarin Chinese, Modern Standard Arabic, Spanish, UK English, and US English
Recognising different speakers from audio file US English, Spanish, or Japanese only
Price 60 minutes free of charge,
then $0.04/minute for each minute