Deep Neural Networks for End-to-End Optimized Speech Coding
dc.contributor.advisor | Jacobs, David | en_US |
dc.contributor.author | Kankanahalli, Srihari | en_US |
dc.contributor.department | Computer Science | en_US |
dc.contributor.publisher | Digital Repository at the University of Maryland | en_US |
dc.contributor.publisher | University of Maryland (College Park, Md.) | en_US |
dc.date.accessioned | 2017-09-14T05:49:44Z | |
dc.date.available | 2017-09-14T05:49:44Z | |
dc.date.issued | 2017 | en_US |
dc.description.abstract | Modern compression algorithms are the result of years of research; industry standards such as MP3, JPEG, and G.722.1 required complex hand-engineered compression pipelines, often with much manual tuning involved on the part of the engineers who created them. Recently, deep neural networks have shown a sophisticated ability to learn directly from data, achieving incredible success over traditional hand-engineered features in many areas. Our aim is to extend these "deep learning" methods into the domain of compression. We present a novel deep neural network model and train it to optimize all the steps of a wideband speech-coding pipeline (compression, quantization, entropy coding, and decompression) end-to-end directly from raw speech data, no manual feature engineering necessary. In testing, our learned speech coder performs on par with or better than current standards at a variety of bitrates (~9kbps up to ~24kbps). It also runs in realtime on an Intel i7-4790K CPU. | en_US |
dc.identifier | https://doi.org/10.13016/M2M03XX8H | |
dc.identifier.uri | http://hdl.handle.net/1903/20035 | |
dc.language.iso | en | en_US |
dc.subject.pqcontrolled | Artificial intelligence | en_US |
dc.subject.pqcontrolled | Acoustics | en_US |
dc.title | Deep Neural Networks for End-to-End Optimized Speech Coding | en_US |
dc.type | Thesis | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Kankanahalli_umd_0117N_18451.pdf
- Size:
- 2.27 MB
- Format:
- Adobe Portable Document Format