Jan 182026
 

Find it here.

This version will use your gpu and is about 3 to 4 times faster.

Although quality is slightly lower compared to CPU.

I am getting good results (i.e less hallucinations & loops) thus with the below parameters.

If the idea is to use the transcript to have an IA turn into minutes/summary, it is plenty acceptable.

In your prompt then, mention the transcript has been generated by an IA with possible hallucinations, loops, etc and also provide a bit of context (it was a meeting about X and Z with person/function A and person/function B) : it usually provides very good results.

whisper-cli.exe -m "../ggml-base.bin" -f "../output.wav" -osrt --language fr --max-context 50 --beam-size 3 --temperature 0 --temperature-inc 0.2 --threads 8 --split-on-word

Jan 142026
 

I have done some benchmark on a 22 minutes wave file (radio interview from 1971 with medium quality recording).

I have then asked Copilot to assess the quality.

According to whisper the GPU version + small model is the sweet spot (CPU+Small being the best).

For the record the processing time (in seconds) below.

CPUtiny174
CPUbase248
CPUsmall796
GPUtiny26
GPUbase43
GPUsmall120

Beware thus that this is very hardware dependant, quality wise and for now I have mixed feeling about GPU although the ratio processing time/quality is clearly (on my hardware) promoting GPU+Small.

To be continued.

Jan 112026
 

Because at work I cannot enjoy the full copilot version and therefore cannot get transcripts (and minutes) from my teams meeting, i decided to give it a go with other tools.

I then discovered Whisper : « Whisper is an open‑source speech‑to‑text model designed by OpenAI to deliver accurate transcription across many languages. It handles real‑world audio remarkably well, even when recordings include background noise, accents, or imperfect microphone quality. Because it runs locally, it offers strong privacy and full control over your workflow. Its different model sizes—from tiny to large—let you balance speed and accuracy depending on your hardware and needs. »

Models can be downloaded here : tiny, base and small models are giving already really good results (before considering medium and large).

It is all about finding the right balance between time to process and accuracy.

Command line:

whisper-cli.exe --model "ggml-base.bin" --file "interview.wav" --output-txt --output-file "interview.txt" --threads 4 --language auto

I strongly recommend to also look at this GUI version here : it uses an older whisper version which is delivering at first look better results (it uses GPU) compared to the latest ggml-org whisper versions : standard, blas (preferred) or cublas (you need cuda installed and a nvidia card).

I might give it a try to openvino in a near futur (time to settle the proper python environement, script, models, etc).