Selecting feature frames for automatic speaker recognition using mutual information

Chi Sang Jung, Moo Young Kim, Hong-Goo Kang

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

In this paper, an information theoretic approach to selecting feature frames for speaker recognition systems is proposed. A conventional approach in which the frame shift is fixed to around half of the frame length may not be the best choice, because the characteristics of the speech signal may rapidly change, especially at phonetic boundaries. Experimental results show that the recognition accuracy increases if the frame interval is directly controlled using phonetic information. By applying these results to the well-known fact that the recognition accuracy is directly correlated with the amount of mutual information, this paper suggests a novel feature frame selection method for speaker recognition. Specifically, feature frames are chosen to have minimum-redundancy within selected feature frames, but maximum-relevancy to speaker models. It is verified by experiments that the proposed method produces consistent improvement, especially in a speaker verification system. It is also robust against variations in acoustic environment.

Original languageEnglish
Article number5276841
Pages (from-to)1332-1340
Number of pages9
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume18
Issue number6
DOIs
Publication statusPublished - 2010 Aug 26

Fingerprint

Speech analysis
Redundancy
Acoustics
phonetics
Experiments
redundancy
intervals
acoustics
shift

All Science Journal Classification (ASJC) codes

  • Acoustics and Ultrasonics
  • Electrical and Electronic Engineering

Cite this

@article{a5284c402ef44bb4aa8bfd4f1c2e1179,
title = "Selecting feature frames for automatic speaker recognition using mutual information",
abstract = "In this paper, an information theoretic approach to selecting feature frames for speaker recognition systems is proposed. A conventional approach in which the frame shift is fixed to around half of the frame length may not be the best choice, because the characteristics of the speech signal may rapidly change, especially at phonetic boundaries. Experimental results show that the recognition accuracy increases if the frame interval is directly controlled using phonetic information. By applying these results to the well-known fact that the recognition accuracy is directly correlated with the amount of mutual information, this paper suggests a novel feature frame selection method for speaker recognition. Specifically, feature frames are chosen to have minimum-redundancy within selected feature frames, but maximum-relevancy to speaker models. It is verified by experiments that the proposed method produces consistent improvement, especially in a speaker verification system. It is also robust against variations in acoustic environment.",
author = "Jung, {Chi Sang} and Kim, {Moo Young} and Hong-Goo Kang",
year = "2010",
month = "8",
day = "26",
doi = "10.1109/TASL.2009.2033631",
language = "English",
volume = "18",
pages = "1332--1340",
journal = "IEEE Transactions on Speech and Audio Processing",
issn = "1558-7916",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "6",

}

Selecting feature frames for automatic speaker recognition using mutual information. / Jung, Chi Sang; Kim, Moo Young; Kang, Hong-Goo.

In: IEEE Transactions on Audio, Speech and Language Processing, Vol. 18, No. 6, 5276841, 26.08.2010, p. 1332-1340.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Selecting feature frames for automatic speaker recognition using mutual information

AU - Jung, Chi Sang

AU - Kim, Moo Young

AU - Kang, Hong-Goo

PY - 2010/8/26

Y1 - 2010/8/26

N2 - In this paper, an information theoretic approach to selecting feature frames for speaker recognition systems is proposed. A conventional approach in which the frame shift is fixed to around half of the frame length may not be the best choice, because the characteristics of the speech signal may rapidly change, especially at phonetic boundaries. Experimental results show that the recognition accuracy increases if the frame interval is directly controlled using phonetic information. By applying these results to the well-known fact that the recognition accuracy is directly correlated with the amount of mutual information, this paper suggests a novel feature frame selection method for speaker recognition. Specifically, feature frames are chosen to have minimum-redundancy within selected feature frames, but maximum-relevancy to speaker models. It is verified by experiments that the proposed method produces consistent improvement, especially in a speaker verification system. It is also robust against variations in acoustic environment.

AB - In this paper, an information theoretic approach to selecting feature frames for speaker recognition systems is proposed. A conventional approach in which the frame shift is fixed to around half of the frame length may not be the best choice, because the characteristics of the speech signal may rapidly change, especially at phonetic boundaries. Experimental results show that the recognition accuracy increases if the frame interval is directly controlled using phonetic information. By applying these results to the well-known fact that the recognition accuracy is directly correlated with the amount of mutual information, this paper suggests a novel feature frame selection method for speaker recognition. Specifically, feature frames are chosen to have minimum-redundancy within selected feature frames, but maximum-relevancy to speaker models. It is verified by experiments that the proposed method produces consistent improvement, especially in a speaker verification system. It is also robust against variations in acoustic environment.

UR - http://www.scopus.com/inward/record.url?scp=77955785588&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77955785588&partnerID=8YFLogxK

U2 - 10.1109/TASL.2009.2033631

DO - 10.1109/TASL.2009.2033631

M3 - Article

AN - SCOPUS:77955785588

VL - 18

SP - 1332

EP - 1340

JO - IEEE Transactions on Speech and Audio Processing

JF - IEEE Transactions on Speech and Audio Processing

SN - 1558-7916

IS - 6

M1 - 5276841

ER -