Go to file
Roy Shilkrot 5227a437b6
VAD based segmentation (#97)
* refactor: Add whisper_buffer to transcription_filter_data struct

* refactor: Add sentence_psum_accept_thresh to transcription_filter_data struct

* refactor: Update buffer size and overlap size in whisper-processing.cpp

* refactor: Update buffer size and overlap size in whisper-processing.cpp

* refactor: Add audio-file-utils.cpp for audio file handling

* refactor: Update buffer size and overlap size in whisper-processing.cpp

* refactor: Add external model option to translation settings

* refactor: Add support for input tokenization style in translation settings

* refactor: Update buffer size and overlap size in whisper-processing.cpp
2024-05-16 15:07:00 -04:00
.github Bump whisper, clblast, add buffered output (#90) 2024-04-18 10:28:32 -04:00
build-aux Update save_srt option and add truncate_output_file option (#64) 2024-01-25 11:44:05 -05:00
cmake VAD based segmentation (#97) 2024-05-16 15:07:00 -04:00
data VAD based segmentation (#97) 2024-05-16 15:07:00 -04:00
src VAD based segmentation (#97) 2024-05-16 15:07:00 -04:00
.clang-format Update save_srt option and add truncate_output_file option (#64) 2024-01-25 11:44:05 -05:00
.cmake-format.json Initial commit 2023-08-10 22:05:20 +03:00
.gitignore dont fail on patch 2023-08-13 18:00:23 +03:00
CMakeLists.txt VAD based segmentation (#97) 2024-05-16 15:07:00 -04:00
CMakePresets.json Built-in Translation (#79) 2024-04-01 14:37:31 -04:00
LICENSE Initial commit 2023-08-10 22:05:20 +03:00
README.md Update buffer size and overlap size in whisper-processing.h and defau… (#95) 2024-05-02 01:03:06 -04:00
buildspec.json Update version to 0.2.6 in buildspec.json 2024-05-11 08:46:50 -04:00
patch_libobs.diff dont fail on patch 2023-08-13 18:00:23 +03:00

README.md

LocalVocal - Speech AI assistant OBS Plugin

GitHub GitHub Workflow Status Total downloads GitHub release (latest by date) Discord

Introduction

LocalVocal live-streaming AI assistant plugin allows you to transcribe, locally on your machine, audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). No GPU required, no cloud costs, no network and no downtime! Privacy first - all data stays on your machine.

If this free plugin has been valuable to you consider adding a to this GH repo, rating it on OBS, subscribing to my YouTube channel where I post updates, and supporting my work on GitHub or Patreon 🙏

For a standalone captioning and translation free open tool consider our LexiSynth, which also does speech synthesis.

Internally the plugin is running a neural network (OpenAI Whisper) locally to predict in real time the speech and provide captions. It's using the Whisper.cpp project from ggerganov to run the Whisper network efficiently on CPUs and GPUs.

Usage


Do more with LocalVocal:

Current Features:

  • Transcribe audio to text in real time in 100 languages
  • Display captions on screen using text sources
  • Send captions to a .txt or .srt file (to read by external sources or video playback) with and without aggregation option
  • Sync'ed captions with OBS recording timestamps
  • Send captions on a RTMP stream to e.g. YouTube, Twitch
  • Bring your own Whisper model (any GGML)
  • Translate captions in real time to major languages (both Whisper built-in translation as well as NMT models with CTranslate2)
  • CUDA, OpenCL, Apple Arm64, AVX & SSE acceleration support

Roadmap:

  • More robust built-in translation options
  • Additional output options: .vtt, .ssa, .sub, etc.
  • Speaker diarization (detecting speakers in a multi-person audio stream)

Check out our other plugins:

  • Background Removal removes background from webcam without a green screen.
  • Detect will detect and track >80 types of objects in real-time inside OBS
  • 🚧 Experimental 🚧 CleanStream for real-time filler word (uh,um) and profanity removal from live audio stream
  • URL/API Source that allows fetching live data from an API and displaying it in OBS.
  • Polyglot translation AI plugin for real-time, local translation to hunderds of languages

Download

Check out the latest releases for downloads and install instructions.

Models

The plugin ships with the Tiny.en model, and will autonomoously download other bigger Whisper models through a dropdown. However there's an option to select an external model file if you have it on disk.

Get more models from https://ggml.ggerganov.com/ and follow the instructions on whisper.cpp to create your own models or download others such as distilled models.

Building

The plugin was built and tested on Mac OSX (Intel & Apple silicon), Windows (with and without Nvidia CUDA) and Linux.

Start by cloning this repo to a directory of your choice.

Mac OSX

Using the CI pipeline scripts, locally you would just call the zsh script, which builds for the architecture specified in $MACOS_ARCH (either x86_64 or arm64).

$ MACOS_ARCH="x86_64" ./.github/scripts/build-macos -c Release

Install

The above script should succeed and the plugin files (e.g. obs-localvocal.plugin) will reside in the ./release/Release folder off of the root. Copy the .plugin file to the OBS directory e.g. ~/Library/Application Support/obs-studio/plugins.

To get .pkg installer file, run for example

$ ./.github/scripts/package-macos -c Release

(Note that maybe the outputs will be in the Release folder and not the install folder like pakage-macos expects, so you will need to rename the folder from build_x86_64/Release to build_x86_64/install)

Linux (Ubuntu)

For successfully building on linux, first clone the repo, then from the repo directory:

$ sudo apt install -y libssl-dev
$ ./.github/scripts/build-linux

Copy the results to the standard OBS folders on Ubuntu

$ sudo cp -R release/RelWithDebInfo/lib/* /usr/lib/
$ sudo cp -R release/RelWithDebInfo/share/* /usr/share/

Note: The official OBS plugins guide recommends adding plugins to the ~/.config/obs-studio/plugins folder. This has to do with the way you installed OBS.

In case the above doesn't work, attempt to copy the files to the ~/.config folder:

$ mkdir -p ~/.config/obs-studio/plugins/obs-localvocal/bin/64bit
$ cp -R release/RelWithDebInfo/lib/x86_64-linux-gnu/obs-plugins/* ~/.config/obs-studio/plugins/obs-localvocal/bin/64bit/
$ mkdir -p ~/.config/obs-studio/plugins/obs-localvocal/data
$ cp -R release/RelWithDebInfo/share/obs/obs-plugins/obs-localvocal/* ~/.config/obs-studio/plugins/obs-localvocal/data/

Windows

Use the CI scripts again, for example:

> .github/scripts/Build-Windows.ps1 -Configuration Release

The build should exist in the ./release folder off the root. You can manually install the files in the OBS directory.

> Copy-Item -Recurse -Force "release\Release\*" -Destination "C:\Program Files\obs-studio\"

Building with CUDA support on Windows

LocalVocal will now build with CUDA support automatically through a prebuilt binary of Whisper.cpp from https://github.com/occ-ai/occ-ai-dep-whispercpp. The CMake scripts will download all necessary files.

To build with cuda add CPU_OR_CUDA as an environment variable (with cpu, 12.2.0 or 11.8.0) and build regularly

> $env:CPU_OR_CUDA="12.2.0"
> .github/scripts/Build-Windows.ps1 -Configuration Release
Star History Chart