مشخصات مقاله | |
ترجمه عنوان مقاله | بهبود سیستم های خودکار تشخیص گفتار با استفاده از ویژگی های دینامیکی غیر خطی ارزیابی شده با نمودار بازگشتی سیگنال های گفتاری |
عنوان انگلیسی مقاله | Improvement of automatic speech recognition systems via nonlinear dynamical features evaluated from the recurrence plot of speech signals |
انتشار | مقاله سال 2017 |
تعداد صفحات مقاله انگلیسی | 12 صفحه |
هزینه | دانلود مقاله انگلیسی رایگان میباشد. |
پایگاه داده | نشریه الزویر |
نوع نگارش مقاله |
مقاله پژوهشی (Research article) |
مقاله بیس | این مقاله بیس نمیباشد |
نمایه (index) | scopus – master journals – JCR |
نوع مقاله | ISI |
فرمت مقاله انگلیسی | |
ایمپکت فاکتور(IF) |
1.747 در سال 2017 |
شاخص H_index | 43 در سال 2018 |
شاخص SJR | 0.401 در سال 2018 |
رشته های مرتبط | مهندسی کامپیوتر – مهندسی پزشکی |
گرایش های مرتبط | الگوریتم و محاسبات – بیوالکتریک |
نوع ارائه مقاله |
ژورنال |
مجله / کنفرانس | Computers & Electrical Engineering |
دانشگاه | Biomedical Engineering Department, Amirkabir University of Technology, Hafez Ave., P.O. Box 15875-4413, Tehran, Iran |
کلمات کلیدی | تشخیص گفتار خودکار، ضریب کپسترال فرکانس مل، فضای فازی بازسازی، نمودار بازگشتی، تبدیل موجک دو بعدی |
کلمات کلیدی انگلیسی | Automatic speech recognition, Mel-frequency cepstral coefficients, Reconstructed phase space, Recurrence plot, Two-dimensional wavelet transform |
شناسه دیجیتال – doi |
https://doi.org/10.1016/j.compeleceng.2016.07.006 |
کد محصول | E11750 |
وضعیت ترجمه مقاله | ترجمه آماده این مقاله موجود نمیباشد. میتوانید از طریق دکمه پایین سفارش دهید. |
دانلود رایگان مقاله | دانلود رایگان مقاله انگلیسی |
سفارش ترجمه این مقاله | سفارش ترجمه این مقاله |
فهرست مطالب مقاله: |
Outline Highlights Abstract Graphical abstract Keywords 1. Introduction 3. Experimental methodology 4. Experimental results 5. Conclusion References Vitae |
بخشی از متن مقاله: |
Abstract The spectral-based features, typically used in Automatic Speech Recognition (ASR) systems, reject the phase information of speech signals. Thus, employing extra features, in which the phase of the signal is not rejected, may fill this gap. Embedding the speech signal in the Reconstructed Phase Space (RPS) and then extracting some useful features from it, is a recently considered approach in this field. In this paper, we will follow this approach by evaluating some useful features from the Recurrence Plot (RP) of the embedded speech signals in the RPS; the proposed features are evaluated via applying a two-dimensional wavelet transform to the resulted RP diagrams. The proposed features are examined in an ASR task alone and in combination with the traditional Mel-Frequency Cepstral Coefficients (MFCC). For the second case, using English TIMIT corpus, 3.94% absolute classification accuracy improvement in the phoneme recognition accuracy rate, against using only the MFCC features is gained. Introduction In recent decades, a variety of linear models for speech coding, synthesis, and recognition with acceptable performances have been introduced. In this way, many types of research achieved improvement in the field of speech recognition by employing novel methods [1,2] or the detection of mispronunciation using Hidden Markov model [3]; however, there are nonlinear aerodynamic phenomena in the human speech production system which generally could not be included in linear models [4]. Therefore, nonlinear methods could potentially provide effective computational models to extract acoustic features which are useful for the nonlinear phenomena detection [4]. Furthermore, some recent studies have shown that utilizing nonlinear characteristics may improve the performances of the ASR systems [5]. Usual ASR systems exploit frequency domain features like Mel frequency cepstral coefficients [6]. The traditional frequency domain methods typically extract only the first and second order properties from the spectral patterns of speech signals [7]. However, there are many signals produced via nonlinear differential equations that have wide spectral characteristics [8]. In such cases, the frequency domain techniques are deficient, because it is impossible to dissociate the information of such a signal only in the frequency domain [5]. |