مقاله انگلیسی رایگان در مورد یادگیری گسترده سازی پهنای باند با استفاده از تجزیه محرک-ادراکی - 2019 IEEE

مشخصات مقاله
ترجمه عنوان مقاله	یادگیری گسترده سازی پهنای باند با استفاده از تجزیه محرک-ادراکی
عنوان انگلیسی مقاله	Learning Bandwidth Expansion Using Perceptually-motivated Loss
انتشار	مقاله سال 2019
تعداد صفحات مقاله انگلیسی	5 صفحه
هزینه	دانلود مقاله انگلیسی رایگان میباشد.
پایگاه داده	نشریه IEEE
مقاله بیس	این مقاله بیس نمیباشد
نوع مقاله	ISI
فرمت مقاله انگلیسی	PDF
شناسه ISSN	1520-6149
مدل مفهومی	ندارد
پرسشنامه	ندارد
متغیر	ندارد
رفرنس	دارد
رشته های مرتبط	مهندسی کامپیوتر
گرایش های مرتبط	مهندسی نرم افزار، طراحی و تولید نرم افزار، معماری سیستم های کامپیوتری
نوع ارائه مقاله	کنفرانس
مجله / کنفرانس	کنفرانس بین المللی آکوستیک، گفتار و پردازش سیگنال – International Conference on Acoustics, Speech and Signal Processing
دانشگاه	Princeton University
کلمات کلیدی	گسترده سازی پهنای باند، گسترده سازی پهنای باند، تفکیک پذیری فوق العاده صدا، یادگیری عمیق
کلمات کلیدی انگلیسی	Bandwidth expansion، bandwidth extension، audio super resolution، deep learning
شناسه دیجیتال – doi	https://doi.org/10.1109/ICASSP.2019.8682367
کد محصول	E13342
وضعیت ترجمه مقاله	ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید.
دانلود رایگان مقاله	دانلود رایگان مقاله انگلیسی
سفارش ترجمه این مقاله	سفارش ترجمه این مقاله

فهرست مطالب مقاله:

Abstract

1- INTRODUCTION

2- RELATED WORK

3- METHOD

4- EVALUATION

5- CONCLUSION

References

بخشی از متن مقاله:

Abstract

We introduce a perceptually motivated approach to bandwidth expansion for speech. Our method pairs a new 3-way split variant of the FFTNet neural vocoder structure with a perceptual loss function, combining objectives from both the time and frequency domains. Mean opinion score tests show that it outperforms baseline methods from both domains, even for extreme bandwidth expansion.

INTRODUCTION

This paper introduces a deep learning-based method for bandwidth expansion of human speech. The goal of the bandwidth expansion (BWE) problem, also called “bandwidth extension” and “audio super-resolution,” is to expand the frequency range of an input audio signal. Its traditional applications are in telephony, where the bandwidth of telephone speech may be limited to below 4 kHz, thus aiming to render muffled speech more intelligible [1]. In the context of newer audio synthesis tasks, such as textto-speech (TTS) and consumer digital media creation, there arises a need for more extreme BWE, such as to 44.1 kHz or 48 kHz. In WaveNet-like applications, for example, speech is synthesized at a low sampling rate for efficiency reasons [2]. BWE may be applied to synthesized audio to improve the listening experience. In another use case, many consumers record speech on low-bandwidth devices, such as a consumergrade microphone, and would like higher-resolution versions of their recordings for podcasts or other artistic purposes. In these applications, the input bandwidth might not be as low as that of telephone transmission, but rather around 8 kHz. Our objective is to super-resolve speech to high-definition audio – in our experiments, we convert 8 kHz to 44.1 kHz, although these are just parameters of the method. By expanding beyond 16 kHz, we emphasize not intelligibility as in traditional BWE, but high perceptual quality and sense of presence in the recording, since the extreme upper bands offer information beyond just speech content, including the finer details of the speaker’s voice and environment.

مقاله انگلیسی رایگان در مورد یادگیری گسترده سازی پهنای باند با استفاده از تجزیه محرک-ادراکی – 2019 IEEE

دیدگاهتان را بنویسید لغو پاسخ