A Small Brazilian Portuguese Speech Corpus for Speaker Recognition Study
DOI:
https://doi.org/10.5433/1679-0375.2024.v45.50518Keywords:
Brazilian Portuguese speech corpus, GMM, MFCC, Speaker recognitionAbstract
A small Brazilian speech corpus was created for educational purposes to study a state-of-the-art speaker recognition system. The system uses the Gaussian Mixture Model (GMM) as a statistical model for speakers and employs the Mel-frequency cepstral coefficients (MFCC) as acoustic features. The results using clean and noisy speech are compatible with the expected results, showing that the bigger the mismatch between training and test conditions, the worse the results. The results also improve with the increase in the utterance length. Finally, the obtained results can be used as baselines to compare with other speaker statistical models created with different acoustic features in different acoustic conditions.
Downloads
References
Alcaim, A., Solewicz, J. A., & Moraes, J. A. (1992). Frequência de Ocorrência dos Fones e Listas de Frases Foneticamente Balanceadas no Português Falado no Rio de Janeiro. Revista da Sociedade Brasileira de Telecomunicações, 7(1), 40–47. DOI: https://doi.org/10.14209/jcis.1992.2
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
Candido, A. J., Casanova, E., Soares, A., Oliveira, F. S., Oliveira, L., Fernandes, R. C. J., Silva, D. P. P., Fayet, F. G., Carlotto, B. B., Gris, L. R. S. e Aluísio, S. M. (2023). CORAA ASR: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese. Language Resources and Evaluation, 57, 1139–1171. DOI: https://doi.org/10.1007/s10579-022-09621-4
Casanova, E., Candido, A. J., Shulby, C. D., Oliveira, F. S., Teixeira, J. P., Ponti, M. A. e Aluísio, S. M. (2022). TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese. Language Resources and Evaluation, 56, 1043–1055. DOI: https://doi.org/10.1007/s10579-021-09570-4
Dempster, A. P., Laird, N. M. e Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society: series B (methodological), 39(1), 1–38. DOI: https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
Diener, L., Vishkasougheh, M. R. e Schultz, T. (2020). CSL-EMG_Array: An Open Access Corpus for EMG-to-Speech Conversion. Proceedings Interspeech 2020, Shangai, China. DOI: https://doi.org/10.21437/Interspeech.2020-2859
Jyothi, S. e Geethanjali, N. (2022). Arrythmia prediction from high dimensional electrocardiogram’s data corpus using ensemble classification. International Journal of Health Sciences, 6(S1), 4790–4810. DOI: https://doi.org/10.53730/ijhs.v6nS1.5898
Kinnunen, T., Karpov, E. e Franti, P. (2005). Real-time speaker identification and verification. IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 277–288. DOI: https://doi.org/10.1109/TSA.2005.853206
Kučera, H. e Francis, W. N. (1967). Computational analysis of present-day American English. Brown University Press.
Kuo, J. e Lee-Messer, C. (2017). The stanford EEG corpus: A large open dataset of electroencephalograms from children and adults to support machine learning technology. IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 1–2. DOI: https://doi.org/10.1109/SPMB.2017.8257034
Leite, P. H. L., Hoyle, E., Antelo, Á., Kruszielski, L. F. e Biscainho, L. W. P. (2022). A Corpus of Neutral Voice Speech in Brazilian Portuguese. In V. Pinheiro, P. Gamallo, R. Amaro, C. Scarton, F. Batista, D. Silva, C. Magro e H. Pinto (Eds.), Computational Processing of the Portuguese Language (pp. 344–352). 15th International Conference, PROPOR 2022, Fortaleza, Brazil. DOI: https://doi.org/10.1007/978-3-030-98305-5_32
Liu, Z., Wu, Z., Li, T., Li, J. e Shen, C. (2018). GMM and CNN hybrid method for short utterance speaker recognition. IEEE Transactions on Industrial Informatics, 14(7), 3244–3252. DOI: https://doi.org/10.1109/TII.2018.2799928
Mathworks. (2024). MatLab - Designed for the way you think and the work you do.
Nassif, A. B., Shahin, I., Hamsa, S., Nemmour, N. e Hirose, K. (2021). CASA-based speaker identification using cascaded GMM-CNN classifier in noisy and emotional talking conditions. Applied Soft Computing, 103, 1–11. DOI: https://doi.org/10.1016/j.asoc.2021.107141
Paulino, M. A., Costa, Y. M., Britto, A. S., Svaigen, A. R., Aylon, L. B. e Oliveira, L. E. (2018). A Brazilian speech database. In IEEE Conferences [Proceedings]. 30th International Conference on Tools with Artificial Intelligence (ICTAI). Volos, Greece, 234–241. DOI: https://doi.org/10.1109/ICTAI.2018.00044
Rabiner, L. e Juang, B.-H. (1993). Fundamentals of speech recognition. Prentice-Hall.
Raso, T., Mello, H., & Mittmann, M. M. (2012). The C-ORAL-BRASIL I: Reference corpus for spoken Brazilian Portuguese. In N. Calzolari, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12) (pp. 106–113). 8 International Conference on Language Resources and Evaluation. Istanbul, Turkey. DOI: https://doi.org/10.1007/978-3-642-28885-2_40
Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83. DOI: https://doi.org/10.1109/89.365379
Ynoguti, C. A., & Violaro, F. (2008). A Brazilian Portuguese speech database. In Sociedade Brasileira de Telecomunicações, SBrT2008 [Proceedings]. XXVI Simpósio Brasileiro de Telecomunicações, Rio de Janeiro, Brasil. DOI: https://doi.org/10.14209/sbrt.2008.42398
Zhang, H., Sun, A., Jing, W., Nan, G., Zhen, L., Zhou, J. T., & Goh, R. S. M. (2021). Video Corpus Moment Retrieval with Contrastive Learning. ArXiv, 1, 1–11. DOI: https://doi.org/10.1145/3404835.3462874
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Alberto Yoshihiro Nakano, Hélio Rodrigues da Silva, Julian Rodrigues Dourado, Felipe Walter Dafico Pfrimer
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The Copyright Declaration for articles published in this journal is the author’s right. Since manuscripts are published in an open access Journal, they are free to use, with their own attributions, in educational and non-commercial applications. The Journal has the right to make, in the original document, changes regarding linguistic norms, orthography, and grammar, with the purpose of ensuring the standard norms of the language and the credibility of the Journal. It will, however, respect the writing style of the authors. When necessary, conceptual changes, corrections, or suggestions will be forwarded to the authors. In such cases, the manuscript shall be subjected to a new evaluation after revision. Responsibility for the opinions expressed in the manuscripts lies entirely with the authors.
This journal is licensed with a license Creative Commons Attribution-NonCommercial 4.0 International.