Link to Repository of Code
Describe my work briefly
I am happy to update on my Week 2 and Week 3 progress. These two weeks were crucial. I was working on data pre-processing, and computing standard 13-dimensional Mel-Frequency Cepstral Coefficients (MFCC) features of the speech data.
It all started with the installation of Kaldi and the requisites. Kaldi installation which looks seemingly straightforward; proved to be quite challenging. The documentation is not straight forward. I witnessed many errors. I resolved a few errors by referring to Kaldi’s documentation, and for a few, I had to re-install Kaldi all together, which every time took 4-5 hours to install depending on the server configurations. I managed to document all the necessary steps on GitHub.
A pre-processing module needs to be scripted to achieve the goal. Further, a manifest for the audio and transcripts needs to be created, and data need to be structured to feed them to Kaldi MFCC feature extraction module. It’s necessary to arrange the data according to Kaldi’s format.
(env) agarwal@:~/backup/kaldi-trunk/egs/recipe_v1/data/train$ tree . ├── cmvn.scp ├── feats.scp ├── frame_shift ├── segments ├── spk2utt ├── text ├── utt2dur ├── utt2num_frames ├── utt2spk └── wav.scp
I utilized several scripts from Kaldi Wall Street Journal (WSJ), Tedlium, and Tuda-De project.
Feature extraction is the most critical step in any ASR. The scripts from Kaldi can be used to compute the speech data as standard 13-dimensional Mel-Frequency Cepstral Coefficients (MFCC) features. I managed to use the scripts from the Wall Street Journal Project (WSJ)
steps/compute_cmvn_stats.sh data/$dev exp/make_mfcc/$x $mfccdir
(env) agarwal@:~/backup/kaldi-trunk/egs/recipe_v1/exp/make_mfcc/dev$ tree . ├── cmvn_dev.log ├── make_mfcc_dev.1.log ├── make_mfcc_dev.2.log ├── make_mfcc_dev.3.log ├── make_mfcc_dev.4.log └── make_mfcc_dev.5.log
Week 4 and Week 5 is the evaluation period. I will take feedback from mentors and bring improvements to my task. The next task would be to build the German phoneme dictionary. The data is not readily available. I approach to use the scripts provided by the researchers at Hamburg University, Germany. The technique needs to be verified, and some bottlenecks are expected.
I will keep you posted about my progress!