{"id":2623,"date":"2026-01-11T18:41:37","date_gmt":"2026-01-11T17:41:37","guid":{"rendered":"https:\/\/labalec.fr\/erwan\/?p=2623"},"modified":"2026-01-14T20:39:04","modified_gmt":"2026-01-14T19:39:04","slug":"automatic-speech-recognition-asr-with-whisper","status":"publish","type":"post","link":"https:\/\/labalec.fr\/erwan\/?p=2623","title":{"rendered":"Automatic speech recognition (ASR) with Whisper"},"content":{"rendered":"\n<p>Because at work I cannot enjoy the full copilot version and therefore cannot get transcripts (and minutes) from my teams meeting, i decided to give it a go with other tools.<\/p>\n\n\n\n<p>I then discovered <a href=\"https:\/\/github.com\/ggml-org\/whisper.cpp\" data-type=\"link\" data-id=\"https:\/\/github.com\/ggml-org\/whisper.cpp\" target=\"_blank\" rel=\"noreferrer noopener\">Whisper<\/a> : <em>\u00ab\u00a0Whisper is an open\u2011source speech\u2011to\u2011text model designed by OpenAI to deliver accurate transcription across many languages. It handles real\u2011world audio remarkably well, even when recordings include background noise, accents, or imperfect microphone quality. Because it runs locally, it offers strong privacy and full control over your workflow. Its different model sizes\u2014from tiny to large\u2014let you balance speed and accuracy depending on your hardware and needs.\u00a0\u00bb<\/em><\/p>\n\n\n\n<p>Models can be downloaded <a href=\"https:\/\/huggingface.co\/ggerganov\/whisper.cpp\/tree\/main\" data-type=\"link\" data-id=\"https:\/\/huggingface.co\/ggerganov\/whisper.cpp\/tree\/main\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a> : tiny, base and small models are giving already really good results (before considering medium and large).<\/p>\n\n\n\n<p>It is all about finding the right balance between time to process and accuracy.<\/p>\n\n\n\n<p>Command line:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>whisper-cli.exe --model \"ggml-base.bin\" --file \"interview.wav\" --output-txt --output-file \"interview.txt\" --threads 4 --language auto<\/code><\/pre>\n\n\n\n<p>I strongly recommend to also look at this GUI version <a href=\"https:\/\/github.com\/Const-me\/Whisper\" data-type=\"link\" data-id=\"https:\/\/github.com\/Const-me\/Whisper\">here<\/a> : it uses an older whisper version which is delivering at first look better results (it uses GPU) compared to the latest ggml-org whisper versions : standard, blas (preferred) or cublas (you need cuda installed and a nvidia card).<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/labalec.fr\/erwan\/wp-content\/uploads\/2026\/01\/image.png\"><img loading=\"lazy\" decoding=\"async\" width=\"696\" height=\"480\" src=\"https:\/\/labalec.fr\/erwan\/wp-content\/uploads\/2026\/01\/image.png\" alt=\"\" class=\"wp-image-2626\" srcset=\"https:\/\/labalec.fr\/erwan\/wp-content\/uploads\/2026\/01\/image.png 696w, https:\/\/labalec.fr\/erwan\/wp-content\/uploads\/2026\/01\/image-300x207.png 300w\" sizes=\"auto, (max-width: 696px) 100vw, 696px\" \/><\/a><\/figure>\n\n\n\n<p>I might give it a try to openvino in a near futur (time to settle the proper python environement, script, <a href=\"https:\/\/huggingface.co\/Intel\/whisper.cpp-openvino-models\/tree\/main\" data-type=\"link\" data-id=\"https:\/\/huggingface.co\/Intel\/whisper.cpp-openvino-models\/tree\/main\" target=\"_blank\" rel=\"noreferrer noopener\">models<\/a>, etc).<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Because at work I cannot enjoy the full copilot version and therefore cannot get transcripts (and minutes) from my teams meeting, i decided to give it a go with other tools. I then discovered Whisper : \u00ab\u00a0Whisper is an open\u2011source speech\u2011to\u2011text model designed by OpenAI to deliver accurate transcription across many languages. It handles real\u2011world <a href='https:\/\/labalec.fr\/erwan\/?p=2623' class='excerpt-more'>[&#8230;]<\/a><\/p>\n","protected":false},"author":7,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[150],"tags":[149],"class_list":["post-2623","post","type-post","status-publish","format-standard","hentry","category-whisper","tag-whisper","category-150-id","post-seq-1","post-parity-odd","meta-position-corners","fix"],"_links":{"self":[{"href":"https:\/\/labalec.fr\/erwan\/index.php?rest_route=\/wp\/v2\/posts\/2623","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/labalec.fr\/erwan\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/labalec.fr\/erwan\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/labalec.fr\/erwan\/index.php?rest_route=\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/labalec.fr\/erwan\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2623"}],"version-history":[{"count":5,"href":"https:\/\/labalec.fr\/erwan\/index.php?rest_route=\/wp\/v2\/posts\/2623\/revisions"}],"predecessor-version":[{"id":2635,"href":"https:\/\/labalec.fr\/erwan\/index.php?rest_route=\/wp\/v2\/posts\/2623\/revisions\/2635"}],"wp:attachment":[{"href":"https:\/\/labalec.fr\/erwan\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2623"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/labalec.fr\/erwan\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2623"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/labalec.fr\/erwan\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2623"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}