Abstract

This paper proposes a deep learning-based generalized empirical flow model (EFM) that can provide a fast and accurate prediction of the glottal flow during normal phonation. The approach is based on the assumption that the vibration of the vocal folds can be represented by a universal kinematics equation (UKE), which is used to generate a glottal shape library. For each shape in the library, the ground truth values of the flow rate and pressure distribution are obtained from the high-fidelity Navier–Stokes (N–S) solution. A fully connected deep neural network (DNN) is then trained to build the empirical mapping between the shapes and the flow rate and pressure distributions. The obtained DNN-based EFM is coupled with a finite element method (FEM)-based solid dynamics solver for fluid–structure–interaction (FSI) simulation of phonation. The EFM is evaluated by comparing the N-S solutions in both static glottal shapes and FSI simulations. The results demonstrate a good prediction performance in accuracy and efficiency.

References

1.
Titze
,
I. R.
,
2000
,
Principles of Voice Production
,
National Center for Voice and Speech
,
Iowa City, IA
.
2.
Ruty
,
N.
,
Pelorson
,
X.
,
Van Hirtum
,
A.
,
Lopez-Arteaga
,
I.
, and
Hirschberg
,
A.
,
2007
, “
An In Vitro Setup to Test the Relevance and the Accuracy of Low-Order Vocal Folds Models
,”
J. Acoust. Soc. Am.
,
121
(
1
), pp.
479
490
.10.1121/1.2384846
3.
Wurzbacher
,
T.
,
Schwarz
,
R.
,
Döllinger
,
M.
,
Hoppe
,
U.
,
Eysholdt
,
U.
, and
Lohscheller
,
J.
,
2006
, “
Model-Based Classification of Nonstationary Vocal Fold Vibrations
,”
J. Acoust. Soc. Am.
,
120
(
2
), pp.
1012
1027
.10.1121/1.2211550
4.
Zañartu
,
M.
,
Mongeau
,
L.
, and
Wodicka
,
G. R.
,
2007
, “
Influence of Acoustic Loading on an Effective Single Mass Model of the Vocal Folds
,”
J. Acoust. Soc. Am.
,
121
(
2
), pp.
1119
1129
.10.1121/1.2409491
5.
Alipour
,
F.
,
Berry
,
D. A.
, and
Titze
,
I. R.
,
2000
, “
A Finite-Element Model of Vocal-Fold Vibration
,”
J. Acoust. Soc. Am.
,
108
(
6
), pp.
3003
3012
.10.1121/1.1324678
6.
Erath
,
B. D.
,
Zañartu
,
M.
,
Peterson
,
S. D.
, and
Plesniak
,
M. W.
,
2011
, “
Nonlinear Vocal Fold Dynamics Resulting From Asymmetric Fluid Loading on a Two-Mass Model of Speech
,”
Chaos
,
21
(
3
), p.
033113
.10.1063/1.3615726
7.
Ishizaka
,
K.
, and
Flanagan
,
J. L.
,
1972
, “
Synthesis of Voiced Sounds From a Two‐Mass Model of the Vocal Cords
,”
Bell Syst. Tech. J.
,
51
(
6
), pp.
1233
1268
.10.1002/j.1538-7305.1972.tb02651.x
8.
Jiang
,
J. J.
, and
Zhang
,
Y.
,
2002
, “
Chaotic Vibration Induced by Turbulent Noise in a Two-Mass Model of Vocal Folds
,”
J. Acoust. Soc. Am.
,
112
(
5
), pp.
2127
2133
.10.1121/1.1509430
9.
Steinecke
,
I.
, and
Herzel
,
H.
,
1995
, “
Bifurcations in an Asymmetric Vocal-Fold Model
,”
J. Acoust. Soc. Am.
, 97(3), pp.
1874
1884
.10.1121/1.412061
10.
Story
,
B. H.
, and
Titze
,
I. R.
,
1995
, “
Voice Simulation With a Body-Cover Model of the Vocal Folds
,”
J. Acoust. Soc. Am.
, 97(2), pp.
1249
1260
.10.1121/1.412234
11.
Tao
,
C.
, and
Jiang
,
J. J.
,
2008
, “
Chaotic Component Obscured by Strong Periodicity in Voice Production System
,”
Phys. Rev. E Stat. Nonlinear, Soft Matter Phys.
,
77
(
6
), pp.
1
8
.10.1103/PhysRevE.77.061922
12.
Titze
,
I. R.
,
1988
, “
The Physics of Small-Amplitude Oscillation of the Vocal Folds
,”
J. Acoust. Soc. Am.
,
83
(
4
), pp.
1536
1552
.10.1121/1.395910
13.
Zhang
,
Y.
, and
Jiang
,
J. J.
,
2008
, “
Nonlinear Dynamic Mechanism of Vocal Tremor From Voice Analysis and Model Simulations
,”
J. Sound Vib.
,
316
(
1–5
), pp.
248
262
.10.1016/j.jsv.2008.02.026
14.
Deverge
,
M.
,
Pelorson
,
X.
,
Vilain
,
C.
,
Lagrée
,
P.-Y.
,
Chentouf
,
F.
,
Willems
,
J.
, and
Hirschberg
,
A.
,
2003
, “
Influence of Collision on the Flow Through in-Vitro Rigid Models of the Vocal Folds
,”
J. Acoust. Soc. Am.
,
114
(
6
), pp.
3354
3362
.10.1121/1.1625933
15.
Pelorson
,
X.
,
Hirschberg
,
A.
,
van Hassel
,
R. R.
,
Wijnands
,
A. P. J.
, and
Auregan
,
Y.
,
1994
, “
Theoretical and Experimental Study of Quasisteady-Flow Separation Within the Glottis During Phonation. Application to a Modified Two-Mass Model
,”
J. Acoust. Soc. Am.
,
96
(
6
), pp.
3416
3431
.10.1121/1.411449
16.
Scherer
,
R. C.
,
Titze
,
I. R.
, and
Curtis
,
J. F.
,
1983
, “
Pressure-Flow Relationships in Two Models of the Larynx Having Rectangular Glottal Shapes
,”
J. Acoust. Soc. Am.
, 73(2), pp.
668
676
.10.1121/1.388959
17.
Zhang
,
L.
, and
Yang
,
J.
,
2016
, “
Evaluation of Aerodynamic Characteristics of a Coupled Fluid-Structure System Using Generalized Bernoulli's Principle: An Application to Vocal Folds Vibration
,”
J. Coupled Syst. Multiscale Dyn.
,
4
(
4
), pp.
241
250
.10.1166/jcsmd.2016.1114
18.
van den Berg
,
J.
,
Zantema
,
J. T.
, and
Doornenbal
,
P.
,
1957
, “
On the Air Resistance and the Bernoulli Effect of the Human Larynx
,”
J. Acoust. Soc. Am.
, 29(5), pp.
626
631
.10.1121/1.1908987
19.
Luo
,
H.
,
Mittal
,
R.
,
Zheng
,
X.
,
Bielamowicz
,
S. A.
,
Walsh
,
R. J.
, and
Hahn
,
J. K.
,
2008
, “
An Immersed-Boundary Method for Flow-Structure Interaction in Biological Systems With Application to Phonation
,”
J. Comput. Phys.
,
227
(
22
), pp.
9303
9332
.10.1016/j.jcp.2008.05.001
20.
Mittal
,
R.
,
Zheng
,
X.
,
Bhardwaj
,
R.
,
Seo
,
J. H.
,
Xue
,
Q.
, and
Bielamowicz
,
S.
,
2011
, “
Toward a Simulation-Based Tool for the Treatment of Vocal Fold Paralysis
,”
Front. Physiol.
, 2(19), pp.
1
15
.10.3389/fphys.2011.00019
21.
Xue
,
Q.
,
Zheng
,
X.
,
Mittal
,
R.
, and
Bielamowicz
,
S.
,
2014
, “
Subject-Specific Computational Modeling of Human Phonation
,”
J. Acoust. Soc. Am.
,
135
(
3
), pp.
1445
1456
.10.1121/1.4864479
22.
Zheng
,
X.
,
Xue
,
Q.
,
Mittal
,
R.
, and
Beilamowicz
,
S.
,
2010
, “
A Coupled Sharp-Interface Immersed Boundary-Finite-Element Method for Flow-Structure Interaction With Application to Human Phonation
,”
ASME J. Biomech. Eng.
,
132
(
11
), p.
111003
.10.1115/1.4002587
23.
Berry
,
D. A.
,
Herzel
,
H.
,
Titze
,
I. R.
, and
Krischer
,
K.
,
1994
, “
Interpretation of Biomechanical Simulations of Normal and Chaotic Vocal Fold Oscillations With Empirical Eigenfunctions
,”
J. Acoust. Soc. Am.
,
95
(
6
), pp.
3595
3604
.10.1121/1.409875
24.
Berry
,
D. A.
,
2001
, “
Mechanism of Modal and Non-Modal Phonation
,”
J. Phon.
,
29
(
4
), pp.
431
450
.10.1006/jpho.2001.0148
25.
Döllinger
,
M.
,
Berry
,
D. A.
, and
Berke
,
G. S.
,
2005
, “
Medial Surface Dynamics of an In Vivo Canine Vocal Fold During Phonation
,”
J. Acoust. Soc. Am.
,
117
(
5
), pp.
3174
3183
.10.1121/1.1871772
26.
Neubauer
,
J.
,
Mergell
,
P.
,
Eysholdt
,
U.
, and
Herzel
,
H.
,
2001
, “
Spatio-Temporal Analysis of Irregular Vocal Fold Oscillations: Biphonation Due to Desynchronization of Spatial Modes
,”
J. Acoust. Soc. Am.
,
110
(
6
), pp.
3179
3192
.10.1121/1.1406498
27.
Zhang
,
Y.
,
Zheng
,
X.
, and
Xue
,
Q.
,
2020
, “
A Deep Neural Network Based Glottal Flow Model for Predicting Fluid-Structure Interactions During Voice Production
,”
Appl. Sci.
, 10(2), pp.
1
18
.10.3390/app10020705
28.
Smith
,
S. L.
, and
Titze
,
I. R.
,
2018
, “
Vocal Fold Contact Patterns Based on Normal Modes of Vibration
,”
J. Biomech.
,
73
, pp.
177
184
.10.1016/j.jbiomech.2018.04.011
29.
Forrest
,
S.
,
1996
, “
Genetic Algorithms
,”
ACM Comput. Surv.
,
28
(
1
), pp.
77
80
.10.1145/234313.234350
30.
Goldberg
,
D. E.
,
2006
,
Genetic Algorithms
,
Pearson Education
, Delhi, India.
31.
Mitchell
,
M.
,
1998
,
An Introduction to Genetic Algorithms
,
MIT Press
, Cambridge, MA.
32.
Goodfellow
,
I.
,
Bengio
,
Y.
,
Courville
,
A.
, and
Bengio
,
Y.
,
2016
,
Deep Learning
,
MIT Press
, Cambridge, MA.
33.
Geng
,
B.
,
Xue
,
Q.
, and
Zheng
,
X.
,
2016
, “
The Effect of Vocal Fold Vertical Stiffness Variation on Voice Production
,”
J. Acoust. Soc. Am.
,
140
(
4
), pp.
2856
2866
.10.1121/1.4964508
34.
Xue
,
Q.
,
Mittal
,
R.
,
Zheng
,
X.
, and
Bielamowicz
,
S.
,
2012
, “
Computational Modeling of Phonatory Dynamics in a Tubular Three-Dimensional Model of the Human Larynx
,”
J. Acoust. Soc. Am.
, 132(3), pp.
1602
1613
.10.1121/1.4740485
35.
Rosenblatt, M.,
1956
, “Remarks on Some Nonparametric Estimates of a Density Function,”
Ann. Math. Statist.
, 27(3), pp. 832–837.10.1214/aoms/1177728190
36.
LeCun
,
Y.
,
Bengio
,
Y.
, and
Hinton
,
G.
,
2015
, “
Deep Learning
,”
Nature
,
521
(
7553
), pp.
436
444
.10.1038/nature14539
37.
Ruder
,
S.
,
2016
, “
An Overview of Gradient Descent Optimization Algorithms
,” arXiv Preprint arXiv1609.04747.
38.
Gulli, A., and Pal, S., 2017, Deep Learning With Keras, Packt Publishing Ltd., Birmingham, UK.
39.
Abadi
,
M.
,
Barham
,
P.
,
Chen
,
J.
,
Chen
,
Z.
,
Davis
,
A.
,
Dean
,
J.
,
Devin
,
M.
,
Ghemawat
,
S.
,
Irving
,
G.
,
Isard
,
M.
,
Kudlur
,
M.
,
Levenberg
,
J.
,
Monga
,
R.
,
Moore
,
S.
,
Murray
,
D. G.
,
Steiner
,
B.
,
Tucker
,
P.
,
Vasudevan
,
V.
,
Warden
,
P.
,
Wicke
,
M.
,
Yu
,
Y.
, and
Zheng
,
X.
,
2016
, “
TensorFlow: A System for Large-Scale Machine Learning
,”
Proceedings 12th USENIX Symposium Operating System Design Implementation, OSDI
,
101
(
C
), Savannah, GA, Nov. 2–4, pp.
265
283
.https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf
40.
Altman
,
D. G.
, and
Bland
,
J. M.
,
1983
, “
Measurement in Medicine: The Analysis of Method Comparison Studies
,”
J. R. Stat. Soc. Ser. D Stat.
,
32
(
3
), pp.
307
317
.10.2307/2987937
41.
Krane
,
M. H.
, and
Wei
,
T.
,
2006
, “
Theoretical Assessment of Unsteady Aerodynamic Effects in Phonation
,”
J. Acoust. Soc. Am.
,
120
(
3
), pp.
1578
1588
.10.1121/1.2215408
42.
Hochreiter
,
S.
, and
Urgen Schmidhuber
,
J.
,
1997
, “
Long Shortterm Memory
,”
Neural Comput.
,
9
(
8
), pp.
1735
1780
.10.1162/neco.1997.9.8.1735
You do not currently have access to this content.