Audio and Video Search
How to configure Speech-To-Text in a Curiosity Workspace
Last updated
How to configure Speech-To-Text in a Curiosity Workspace
Last updated
Speech-to-Text (STT), also known as voice recognition technology, is a method that converts spoken language into written text. In Curiosity it is used to transform audio content from audio and video files into searchable text.
Curiosity has integrated Speech-to-Text capabilities based on the open-source Whisper models. Key features include:
Audio File Processing: Curiosity can process audio files in formats like MP3, WAV, and FLAC, converting spoken words into searchable text. That lets users find contents inside the spoken text and jump to the right place in the audio file.
Video File Processing: Curiosity also processes video files in formats like MP4 to make them searchable. Users can search inside the spoken text and jump to the right place in the video file.
Multi-Language Support: Curiosity STT can recognize and transcribe a broad range of languages, including English, French, Spanish, German, and others.
Curiosity supports speech to text on the following file types:
Video files (.mp4, .wmv, .mpeg, .avi, .mkv, .mov, .ogv, .3gp, .m4a, .oga, .weba, .webm, .flv)
Audio files (.mp3, .wav, .mka, .wma, .flac, .aac, .aiff)
Documentation coming soon...