<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Whisper on Juntak Noh — AI Notes</title>
    <link>https://ai.klavierhye.cc/tags/whisper/</link>
    <description>Recent content in Whisper on Juntak Noh — AI Notes</description>
    <generator>Hugo -- 0.147.7</generator>
    <language>en</language>
    <lastBuildDate>Thu, 19 Feb 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://ai.klavierhye.cc/tags/whisper/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Did Fine-Tuning Actually Help? Evaluating and Benchmarking Whisper for Korean STT</title>
      <link>https://ai.klavierhye.cc/posts/whisper-evaluation/</link>
      <pubDate>Thu, 19 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://ai.klavierhye.cc/posts/whisper-evaluation/</guid>
      <description>&lt;p&gt;&lt;em&gt;This is &lt;strong&gt;Part 3&lt;/strong&gt; of a three-part series on fine-tuning Whisper for Korean speech-to-text: Preprocess → Train → &lt;strong&gt;Evaluate&lt;/strong&gt;. Here we measure whether the fine-tuned model actually improved, and by how much. &lt;a href=&#34;https://ai.klavierhye.cc/posts/whisper-preprocessing/&#34;&gt;Part 1&lt;/a&gt; covered preprocessing; &lt;a href=&#34;https://ai.klavierhye.cc/posts/whisper-training/&#34;&gt;Part 2&lt;/a&gt; covered training.&lt;/em&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;A trained model without evaluation is just a checkpoint on disk. You can stare at the training loss curve and hope it went down, but until you run the model on held-out data and measure something concrete — CER, WER, per-category breakdowns — you don&amp;rsquo;t know if the fine-tuning worked, whether it regressed on certain domains, or how it compares to the baseline you started from.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Training Whisper on Precomputed Features: Full Fine-Tune for Korean STT</title>
      <link>https://ai.klavierhye.cc/posts/whisper-training/</link>
      <pubDate>Sun, 15 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://ai.klavierhye.cc/posts/whisper-training/</guid>
      <description>&lt;p&gt;&lt;em&gt;This is &lt;strong&gt;Part 2&lt;/strong&gt; of a three-part series on fine-tuning Whisper for Korean speech-to-text: Preprocess → &lt;strong&gt;Train&lt;/strong&gt; → Evaluate. Here we load the preprocessed dataset and run the training loop. Part 1 covered preprocessing; Part 3 will cover evaluation and benchmarking.&lt;/em&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;With precomputed mel spectrograms and tokenized labels on disk, the next step is to plug them into a training loop and optimize the model. That sounds straightforward until you start making choices: full fine-tuning or LoRA? What learning rate and batch size? How do you pad variable-length sequences correctly for an encoder-decoder, and how do you avoid wasting GPU memory or blowing up training? This post walks through the training setup I use for Whisper large-v3 on Korean telephonic audio — and the engineering trade-offs behind each decision.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Stop Recomputing Mel Spectrograms: Preprocessing Your Data Before Whisper Fine-Tuning</title>
      <link>https://ai.klavierhye.cc/posts/whisper-preprocessing/</link>
      <pubDate>Wed, 11 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://ai.klavierhye.cc/posts/whisper-preprocessing/</guid>
      <description>&lt;p&gt;&lt;em&gt;This is &lt;strong&gt;Part 1&lt;/strong&gt; of a three-part series on fine-tuning Whisper for Korean speech-to-text: &lt;strong&gt;Preprocess&lt;/strong&gt; → Train → Evaluate. In this post, we build the data preprocessing pipeline. Parts 2 and 3 will cover the training loop and evaluation/benchmarking, respectively.&lt;/em&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;When I first started fine-tuning OpenAI&amp;rsquo;s Whisper for Korean speech-to-text, I noticed something frustrating. Every single time I kicked off a training run — whether I was tweaking the learning rate, adjusting the batch size, or experimenting with a new scheduler — the framework would spend &lt;em&gt;hours&lt;/em&gt; churning through raw audio files before a single gradient was computed. The preprocessing step was identical each time: load WAV files, resample, compute mel spectrograms, tokenize transcriptions. Nothing about the data had changed, yet I was paying the full cost of data preparation on every attempt.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
