Robust Voice Mining Techniques for Telephone Conversations

Manocha, Sandeep

Robust Voice Mining Techniques for Telephone Conversations

Files

umi-umd-3672.pdf (1.24 MB)

No. of downloads: 2687

Date

2006-07-28

Authors

Manocha, Sandeep

Advisor

Espy-Wilson, Carol Y.

Abstract

Voice mining involves speaker detection in a set of multi-speaker files. In published work, training data is used for constructing target speaker models. In this study, a new voice mining scenario was considered, where there is no demarcation between training and testing data and prior target speaker models are absent. Given a database of telephone conversations, the task is to identify conversations having one or more speakers in common. Various approaches including semi-automatic and fully automatic techniques were explored and different scoring strategies were considered. Given the poor audio quality, automatic speaker segmentation is not very effective. A new technique was developed which does not require speaker segmentation by training a multi-speaker model on the entire conversation. This technique is more robust and it outperforms the automatic speaker segmentation approach. On the ENRON database, the EER is 15.98% and 6.25% for at least one and two speakers in common, respectively.

URI (handle)

http://hdl.handle.net/1903/3827

Collections

UMD Theses and Dissertations
Electrical & Computer Engineering Theses and Dissertations

Full item page