Robust Voice Mining Techniques for Telephone Conversations

Loading...
Thumbnail Image

Files

umi-umd-3672.pdf (1.24 MB)
No. of downloads: 2687

Publication or External Link

Date

2006-07-28

Citation

DRUM DOI

Abstract

Voice mining involves speaker detection in a set of multi-speaker files. In published work, training data is used for constructing target speaker models. In this study, a new voice mining scenario was considered, where there is no demarcation between training and testing data and prior target speaker models are absent. Given a database of telephone conversations, the task is to identify conversations having one or more speakers in common. Various approaches including semi-automatic and fully automatic techniques were explored and different scoring strategies were considered. Given the poor audio quality, automatic speaker segmentation is not very effective. A new technique was developed which does not require speaker segmentation by training a multi-speaker model on the entire conversation. This technique is more robust and it outperforms the automatic speaker segmentation approach. On the ENRON database, the EER is 15.98% and 6.25% for at least one and two speakers in common, respectively.

Notes

Rights