obs-localvocal

mirror of https://github.com/occ-ai/obs-localvocal synced 2024-11-08 11:19:49 +00:00

Author	SHA1	Message	Date
Roy Shilkrot	e3c69518a7	Fix hangups and VAD segmentation (#157 ) * Fix hangups and VAD segmentation * feat: Add max_sub_duration field to transcription filter data * chore: Update VAD parameters for better segmentation accuracy * feat: Add segment_duration field to transcription filter data * feat: Optimize VAD processing for better performance * feat: Refactor token buffer thread and whisper processing The code changes involve refactoring the token buffer thread and whisper processing. The token buffer thread now uses the variable name `word_token` instead of `word` for better clarity. In the whisper processing, the log message format has been updated to include the segment number and token number. These changes aim to improve the performance and accuracy of VAD processing, as well as add new fields to the transcription filter data. * Refactor token buffer thread and whisper processing * refactor: Update translation context in transcription filter The code changes in this commit update the translation context in the transcription filter. The `translate_add_context` property has been changed from a boolean to an integer slider, allowing the user to specify the number of context lines to add to the translation. This change aims to provide more flexibility in controlling the context for translation and improve the accuracy of the translation output. * refactor: Update last_text variable name in transcription filter callbacks * feat: Add translation language utilities This commit adds a new file, `translation-language-utils.h`, which contains utility functions for handling translation languages. The `remove_start_punctuation` function removes any leading punctuation from a given string. This utility will be used in the translation process to improve the quality of the translated output. * feat: Update ICU library configuration and dependencies This commit updates the configuration and dependencies of the ICU library. The `BuildICU.cmake` file has been modified to use the `INSTALL_DIR` variable instead of the `ICU_INSTALL_DIR` variable for setting the ICU library paths. Additionally, the `ICU_IN_LIBRARY` variable has been renamed to `ICU_IN_LIBRARY` for better clarity. These changes aim to improve the build process and ensure proper linking of the ICU library. * refactor: Update ICU library configuration and dependencies * refactor: Update ICU library configuration and dependencies * refactor: Update ICU library configuration and dependencies * refactor: Update ICU library configuration and dependencies * refactor: Update ICU library configuration and dependencies * refactor: Update ICU library configuration and dependencies * refactor: Update ICU library configuration and dependencies This commit updates the `BuildICU.cmake` file to set the `CFLAGS`, `CXXFLAGS`, and `LDFLAGS` environment variables to `-fPIC` for Linux platforms. This change aims to ensure that the ICU library is built with position-independent code, improving compatibility and security. Additionally, the `icuin` library has been renamed to `icui18n` to align with the naming convention. These updates enhance the build process and maintain consistency in the ICU library configuration.	2024-09-06 10:27:05 -04:00
Ruwen Hahn	12fa9dce65	Fix model downloader crash on shutdown (#155 ) * Fix `ModelDownloader` not being aware of child object deletions * Delete `ModelDownloader` after it's done processing Otherwise this is only deleted when OBS exits, effectively leaking memory	2024-08-20 16:31:04 -04:00
Ruwen Hahn	bdab41cafc	More offline test improvements (#153 ) * Protect logging with a mutex Main thread and worker thread output could get interleaved weirdly without this * Move segments.json saving to different thread This was taking a considerable amount of time, especially for longer input files, reducing overall utilization * Check whether offline test can push more data before waiting * Fix offline test with large files In ``` circlebuf_push_back( &gf->input_buffers[c], audio[c].data() + frames_count * frame_size_bytes, frames_size_bytes); ``` `frames_count * frame_size_bytes` would overflow with `int` on a 4 hour file; using `size_t` (on a 64 bit platform) fixes that	2024-08-14 09:28:33 -04:00
Ruwen Hahn	6cc88b1ead	Offline test improvements (#150 ) * look at the front of the whisper buffer instead of the back this should mostly not make a difference, but feels semantically more correct * Initialize `resampled_buffer` for offline tests * Read relevant audio bytes There are two issues here: 1. `line_size` may contain padding (didn't happen in my tests) 2. from: `2b5f000d3f`:/libavutil/frame.h#l405 > For audio, only linesize[0] may be set. For planar audio, each > channel plane must be the same size. * log running time in addition to local time * Run whisper test "as fast as possible" This kind of behaves like libobs, where each chunk of audio is inspected individually by VAD/whisper, until processing of either takes longer than the window length, in which case audio continues to stream in * Only ever send a single chunk of audio * Add additional files to tests copy command * Use condition variable to signal input thread if available * Only wait in whisper thread if input buffers are empty	2024-08-09 13:45:42 -04:00
Ruwen Hahn	09839bbf15	Store acceleration info in cmake cache (#151 ) This is to allow switching branches and rebuilding with the "same" settings from e.g. visual studio (which will re-run a bunch of CMake processing)	2024-08-09 13:44:15 -04:00
Roy Shilkrot	f65da7a97c	Update README.md	2024-08-05 20:25:51 -04:00
Roy Shilkrot	7707af0710	refactor: Add target SPM loading and decoding logic in translation mo… (#149 ) * refactor: Add target SPM loading and decoding logic in translation module * refactor: Update target SPM loading error handling in translation module	2024-08-02 22:36:09 -04:00
Ruwen Hahn	0592fa7d9d	Upgrade silero vad v5 (and some other changes) (#148 ) * Add accessor for VAD window size in samples * Feed buffered audio data to VAD in proper window sizes * Wake whisper thread whenever audio is received * Update silero VAD to v5 * Only reset VAD state between chunks of activity	2024-08-02 14:25:59 -04:00
Roy Shilkrot	a173e220c3	Use coreml/metal on Apple	2024-07-30 22:03:16 -07:00
Roy Shilkrot	5bca2ff595	refactor: Update buildspec.json to version 0.3.4	2024-07-31 00:42:34 -04:00
Roy Shilkrot	78907ea14d	refactor: Update whisper model path and enable hipBLAS acceleration (#146 ) * refactor: Update whisper model path and enable hipBLAS acceleration * refactor: Update whisper model path and enable hipBLAS acceleration * refactor: Update whisper model path and enable hipBLAS acceleration * refactor: Update whisper model path and enable hipBLAS acceleration * refactor: Update whisper model path and enable hipBLAS acceleration * refactor: Update whisper model path and enable CoreML acceleration	2024-07-31 00:40:36 -04:00
Roy Shilkrot	87c5a0a1ca	refactor: Prevent duplicate translation of sentences in send_sentence_to_translation (#145 )	2024-07-29 21:04:28 -04:00
Roy Shilkrot	4e2f3def40	refactor: Avoid translating the same sentence twice in send_sentence_to_translation	2024-07-23 23:54:24 -04:00
Roy Shilkrot	c8c22fe5a0	Update README.md	2024-07-22 08:32:56 -04:00
Roy Shilkrot	73c91765a5	refactor: Update whisper model path and add flag for model loaded status	2024-07-19 20:59:55 -04:00
Roy Shilkrot	b3e4bfa33a	refactor: Enable partial transcription with a latency of 1000ms (#141 ) * refactor: Enable partial transcription with a latency of 1000ms * refactor: Update CMakePresets.json and buildspec.json - Remove the "QT_VERSION" variable from CMakePresets.json for all platforms - Update the "version" of "obs-studio" and "prebuilt" dependencies in buildspec.json - Update the "version" of "qt6" dependency in buildspec.json - Update the "version" of the project to "0.3.3" in buildspec.json - Update the "version" of the project to "0.3.3" in CMakePresets.json - Remove unused code in whisper-processing.cpp * refactor: Add -Wno-error=deprecated-declarations option to compilerconfig.cmake * refactor: Update language codes in translation module	2024-07-19 14:02:24 -04:00
Roy Shilkrot	19017ca17f	refactor: Update language codes in translation module (#140 )	2024-07-18 01:00:09 -04:00
Roy Shilkrot	4e3fdcd6ef	fix whisper model loading language (#139 ) * refactor: Add boolean flag for whisper model loaded status * refactor: Improve handling of whisper model paths in transcription filter * refactor: Update whisper model path and add flag for model loaded status	2024-07-17 18:54:34 -04:00
Roy Shilkrot	44f072b5ff	refactor: Add transcription-filter-properties.cpp for managing filter… (#138 ) * refactor: Add transcription-filter-properties.cpp for managing filter properties * refactor: Add translation_monitor to transcription filter - Add translation_monitor to the transcription filter data structure - Initialize and stop the translation_monitor in the transcription_filter_update function - Update the send_caption_to_source function to use the translation_monitor for sending translated captions - Clear the translation_monitor when disabling buffered output in the transcription_filter_update function * refactor: Simplify UI and improve error handling in transcription filter	2024-07-17 12:18:31 -04:00
Roy Shilkrot	3c3b640bdb	Simplified UI (#136 ) * refactor: Update translation option in transcription filter - Update the translation option in the transcription filter to use a more concise label - Remove unnecessary code related to file output in the transcription filter - Improve the handling of whisper model paths in the transcription filter - Set the default language to "auto" in the transcription filter properties * refactor: Improve error handling in model-downloader.cpp and transcription-filter-callbacks.cpp * refactor: Improve error handling in model-downloader.cpp and transcription-filter-callbacks.cpp	2024-07-15 18:28:03 -04:00
Roy Shilkrot	58f9131a05	refactor: Update model-downloader.cpp to use obs_module_config_path f… (#134 ) * refactor: Update model-downloader.cpp to use obs_module_config_path for retrieving the config folder path - Replace the usage of obs_module_get_config_path with obs_module_config_path to retrieve the config folder path in model-downloader.cpp - Add a check for a null config_folder and log an info message if it is null - Convert the config_folder string to a wstring on Windows using MultiByteToWideChar - Update the log messages to provide more descriptive information about the config models folder and the model folder existence in the config folder - Use the updated config_folder_str in the std::filesystem::absolute function call * Trigger Build * refactor: Update model-downloader.cpp to use obs_module_config_path for retrieving the config folder path * refactor: Fix bug in transcription filter callbacks - Add a condition to check for null timestamps before saving the sentence to srt in the send_sentence_to_file function - Remove unnecessary code in the set_text_callback function that checks for empty text after suppression - Update the whisper_loop function to clear the current subtitle if the minimum subtitle duration has passed	2024-07-11 12:22:16 -04:00
Roy Shilkrot	234a938f33	refactor: Update create_obs_text_source function to create source only if it doesn't exist (#126 )	2024-07-09 17:26:47 -04:00
Roy Shilkrot	ee07bbe569	refactor: Update file output option in transcription filter (#128 ) - Update the file output option in the transcription filter to use the new "Save to File" label instead of "Text File output" - Add a new boolean flag "save_to_file" in the transcription filter data structure to track the file output setting - Update the code in transcription-filter-callbacks.cpp and transcription-filter.cpp to use the new flag for file output logic - Update the properties and UI in transcription-filter-properties.cpp to reflect the changes	2024-07-09 17:02:58 -04:00
Roy Shilkrot	34d908505c	bump v0.3.2	2024-07-02 15:35:25 -04:00
Roy Shilkrot	32bbd99404	refactor: Add filter-replace-dialog.cpp for filter and replace functi… (#124 ) * refactor: Add filter-replace-dialog.cpp for filter and replace functionality * refactor: Improve filter-replace-dialog.cpp for filter and replace functionality	2024-07-02 15:27:11 -04:00
Roy Shilkrot	a2244c2157	refactor: Update TokenBufferThread to use TokenBufferString for sentence output (#122 )	2024-07-01 22:00:01 -04:00
Roy Shilkrot	958266fb4e	refactor: Update buffer_output_type translations in locale files (#119 ) * refactor: Update buffer_output_type translations in locale files * refactor: Update buffer_num_chars_per_line translation in locale files * refactor: Remove unused code related to buffer output type selection * refactor: Update TokenBufferThread to use TokenBufferString for caption building * refactor: Update TokenBufferThread to use TokenBufferString for caption building	2024-06-26 15:35:43 -04:00
Roy Shilkrot	db13750891	refactor: Update whisper model path handling in transcription filter (#117 ) * refactor: Update whisper model path handling in transcription filter * refactor: Set default language to "auto" in transcription filter properties	2024-06-23 21:54:35 -04:00
Roy Shilkrot	d64ec2ac11	chore: Update version to 0.3.1 in buildspec.json	2024-06-11 18:33:43 -04:00
Roy Shilkrot	2aa151eb22	Start and stop based on filter enable status (#111 ) * refactor: Add initial_creation flag to transcription filter data * refactor: Improve caption duration calculation in set_text_callback	2024-06-11 17:32:18 -04:00
Roy Shilkrot	91c2842009	refactor: Update timestamp variable name in transcription-filter-data.h (#109 )	2024-06-11 12:15:49 -04:00
Roy Shilkrot	845c1a813c	English language selection by model (#108 ) * refactor: Improve remove_leading_trailing_nonalpha function in transcription-utils.cpp * refactor: Set whisper language to English in transcription filter properties	2024-06-11 09:05:30 -04:00
Roy Shilkrot	ecb3dfce09	Add more whisper models (#107 ) * Add more whisper models * refactor: Improve remove_leading_trailing_nonalpha function in transcription-utils.cpp	2024-06-10 21:24:55 -04:00
Roy Shilkrot	052de4f474	readme update	2024-06-06 01:03:47 -04:00
Roy Shilkrot	93ab51e932	chore: Update version to 0.3.0 in buildspec.json	2024-06-05 18:31:00 -04:00
Roy Shilkrot	67993f393d	Steamline and refactor (#105 ) * refactor: Update whispercpp dependency to version 0.0.3 * refactor: Add buffered output parameters for transcription filter * refactor: Remove unused parameter in set_source_signals function * refactor: Fix character splitting bug in TokenBufferThread * refactor: Update buffer size and overlap size in whisper-processing.cpp * refactor: Remove unused parameter in set_source_signals function * refactor: Fix floating point precision issue in whisper-processing.cpp * refactor: Improve remove_leading_trailing_nonalpha function in transcription-utils.cpp * refactor: Update VAD threshold in transcription filter * refactor: Update VAD threshold parameter name in silero-vad-onnx.h * refactor: Update VAD threshold parameter name in silero-vad-onnx.h * refactor: Update lock_guard parameter name in TokenBufferThread	2024-06-05 18:02:36 -04:00
Roy Shilkrot	9ecd759968	refactor: Update whispercpp dependency to version 0.0.3 (#103 )	2024-05-30 23:00:04 -04:00
Tabitha Cromarty	a0fc46aa29	Add non-Ubuntu Linux build instructions to the README (#102 ) Largely based on @umireon's [AUR build script](https://github.com/occ-ai/obs-localvocal/issues/62#issuecomment-1910708241), these steps worked for me on a Gentoo Linux system as well, so I figured it might be helpful to add them to the README. I feel like some of this could be merged with the Ubuntu section above (which might also in itself be applicable to Debian as well as Ubuntu), but for now this should at least help people	2024-05-30 22:37:03 -04:00
Roy Shilkrot	5227a437b6	VAD based segmentation (#97 ) * refactor: Add whisper_buffer to transcription_filter_data struct * refactor: Add sentence_psum_accept_thresh to transcription_filter_data struct * refactor: Update buffer size and overlap size in whisper-processing.cpp * refactor: Update buffer size and overlap size in whisper-processing.cpp * refactor: Add audio-file-utils.cpp for audio file handling * refactor: Update buffer size and overlap size in whisper-processing.cpp * refactor: Add external model option to translation settings * refactor: Add support for input tokenization style in translation settings * refactor: Update buffer size and overlap size in whisper-processing.cpp	2024-05-16 15:07:00 -04:00
Roy Shilkrot	9c45376d7a	Update version to 0.2.6 in buildspec.json	2024-05-11 08:46:50 -04:00
Roy Shilkrot	31c41a9574	Offline transcription accuracy tests (#96 ) * Update translation-utils.h, transcription-filter.h, whisper-model-utils.h, model-find-utils.h, and model-downloader.h * Update create_context function to include ct2ModelFolder parameter * fix: add fix_utf8 flag to transcription_filter_data struct * Update create_context function to include ct2ModelFolder parameter * Update read_text_from_file function to include join_sentences parameter * fix: Update VadIterator::reset_states to include reset_hc parameter * Update create_context function to include whisper_sampling_method parameter * Update tests README with additional configuration options * feat: Add function to find file in folder by regex expression * refactor: Improve text conditioning logic in transcription-filter.cpp * refactor: Improve text conditioning logic in transcription-filter.cpp * chore: Update ctranslate2 dependency to version 1.2.0 * refactor: Improve text conditioning logic in transcription-filter.cpp * chore: Update cmake BuildCTranslate2.cmake to disable -Wno-comma warning * refactor: Update translation context in whisper-processing.cpp and translation-utils.cpp	2024-05-10 17:37:09 -04:00
Roy Shilkrot	2e83300fbb	Update buffer size and overlap size in whisper-processing.h and defau… (#95 ) * Update buffer size and overlap size in whisper-processing.h and default buffer size in msec in transcription-filter.cpp * Update audio processing timestamp calculation in whisper-processing.cpp * Update OBS plugin installation instructions for Linux * Fix typo in update_whisper_model function name	2024-05-02 01:03:06 -04:00
Roy Shilkrot	493ecad254	Update CTranslate2 and cpu_features dependencies (#94 ) * Update CTranslate2 and cpu_features dependencies * Update CTranslate2 and cpu_features dependencies * Update dependencies and fix special tokens handling * Add BUILD_BYPRODUCTS to CMake build command * Update version to 0.2.5 in buildspec.json	2024-04-30 09:48:23 -04:00
Roy Shilkrot	3b955e3031	Fix special tokens (#93 ) * Update version to 0.2.4 in buildspec.json * Update special token handling in whisper-processing.cpp * Update special token handling in whisper-processing.cpp	2024-04-26 15:34:18 -04:00
Roy Shilkrot	f36f6ec96c	Update version to 0.2.3 in buildspec.json	2024-04-25 17:15:14 -04:00
Roy Shilkrot	ab1b74a35c	Overlap analysis (#92 ) * Update buffer size and overlap size in whisper-processing.h and default buffer size in msec in transcription-filter.cpp * Update buffer size and overlap size in whisper-processing.h and default buffer size in msec in transcription-filter.cpp * Update suppress_sentences in en-US.ini and transcription-filter-data.h * Update suppress_sentences and fix whitespace in transcription-filter-data.h, whisper-processing.h, transcription-utils.cpp, and transcription-filter.h * Update whisper-processing.cpp and whisper-utils.cpp files * Update findStartOfOverlap function signature to use int instead of size_t * Update Whispercpp_Build_GIT_TAG to use commit 7395c70a748753e3800b63e3422a2b558a097c80 in BuildWhispercpp.cmake * Update buffer size and overlap size in whisper-processing.h and default buffer size in msec in transcription-filter.cpp * Update unused parameter in transcription-filter-properties function * Update log level and add suppress_sentences feature in transcription-filter.cpp and whisper-processing.cpp * Add translation output feature in en-US.ini and transcription-filter-data.h * Add DTW token timestamps and buffered output feature * trigger rebuild * Refactor remove_leading_trailing_nonalpha function to improve readability and performance * Refactor is_lead_byte and is_trail_byte macros for improved readability and maintainability * Refactor is_lead_byte and is_trail_byte macros for improved readability and maintainability * trigger build	2024-04-25 17:14:13 -04:00
Roy Shilkrot	65da380f9f	Bump whisper, clblast, add buffered output (#90 ) * Bump whisper, clblast, add buffered output * Update CPU_OR_CUDA environment variable error messages * Update Cublas validation in Package-Windows.ps1 and initialize function in captions-thread.h * Update Cublas validation and fix typo in Package-Windows.ps1 * Update default whisper model path to Whisper Tiny English (74Mb) * Update translation strings for multiple locales	2024-04-18 10:28:32 -04:00
Kaito Udagawa	e5a10f48cc	Fix add_custom_command to accept the argument with paren (#88 ) * Update FetchOnnxruntime.cmake * Update FetchOnnxruntime.cmake	2024-04-15 21:38:46 -04:00
Kaito Udagawa	f4307168de	Update build scripts according to the latest obs-plugintemplate (#87 ) * Update build-project.yaml * Update action.yaml * Update helpers_common.cmake * Update compilerconfig.cmake * Update .clang-format * Fix * Fix * Update build-project.yaml * Update check-format.yaml * Update push.yaml * Update build-project.yaml	2024-04-15 08:19:40 -04:00
Roy Shilkrot	f79571f316	Add Silero VAD (#85 ) * Add Silero VAD model and integrate it into the transcription filter * Fix Silero VAD model path and enable n_threads * Update translation strings for multiple locales * Update Onnxruntime library linking and fix compiler warning * Fix variable naming and type casting in Silero VAD implementation * Update Silero VAD model path and enable n_threads	2024-04-13 22:39:28 -04:00

1 2 3 4

200 Commits