Post by account_disabled on Mar 7, 2024 5:47:55 GMT -4
For the convenience of expression, many literatures often use Mandarin The final consonants are classified at the phoneme level. The context of the phone will have an impact on the pronunciation of the current central phoneme, causing the acoustic signal of the current phoneme to co-change, which is different from the individual pronunciation of the phoneme. Single phoneme modeling does not take this co-articulation into account Effect In order to consider this effect, context-sensitive phonemes, also known as "triphones", are used in actual
operations as the basic unit for acoustic modeling, that is, considering the previous phoneme and the next phoneme of the current phoneme to make the model description more accurate. Fine modeling of triphones requires a large amount of training data. In fact, for some Rich People Phone Number List triphones, data is difficult to obtain. At the same time, precise modeling results in a huge number of model modeling units. For example, if there is a phoneme in the phoneme table, the number of triphones is required. The total number of sub-model parameters for xx obviously increases dramatically. Therefore, in the strict sense, precise modeling of triphonic sounds is not realistic. State binding strategies are often used to reduce the number
of modeling units. Typical binding methods include model binding and decision tree aggregation. The following will focus on three types of acoustic models including those based on -H acoustic model, NN-H based acoustic model and end-to-end model. The -H-based acoustic model H is a statistical analysis model developed on the basis of Markov chains to describe dual stochastic processes. The theoretical basis of H was established by u et al. and then applied to speech recognition by U's er and I's Jeline et al. LRRiner and Yun et al. further promoted the