Audio and Video Search

How to configure Speech-To-Text in a Curiosity Workspace

What is Speech-to-Text?

Speech-to-Text (STT), also known as voice recognition technology, is a method that converts spoken language into written text. In Curiosity it is used to transform audio content from audio and video files into searchable text.

Speech-to-Text Support in Curiosity

Curiosity has integrated Speech-to-Text capabilities based on the open-source Whisper models. Key features include:

  • Audio File Processing: Curiosity can process audio files in formats like MP3, WAV, and FLAC, converting spoken words into searchable text. That lets users find contents inside the spoken text and jump to the right place in the audio file.

  • Video File Processing: Curiosity also processes video files in formats like MP4 to make them searchable. Users can search inside the spoken text and jump to the right place in the video file.

  • Multi-Language Support: Curiosity STT can recognize and transcribe a broad range of languages, including English, French, Spanish, German, and others.

Curiosity showing a video (right) after searching the correct place using Speech To Text

Supported file types

Curiosity supports speech to text on the following file types:

  • Video files (.mp4, .wmv, .mpeg, .avi, .mkv, .mov, .ogv, .3gp, .m4a, .oga, .weba, .webm, .flv)

  • Audio files (.mp3, .wav, .mka, .wma, .flac, .aac, .aiff)

Configuring Speech-to-Text in a Curiosity Workspace

Documentation coming soon...

Last updated