1. These notes were made on the basis of experiments only with LEON library.
However, we assume that they are also true for LOLA è MIRIAM, since they deal
with the synthesis itself.
2. These observations first of all concern acoustic parameters in synthesis
control and do not deal with phonetic or linguistic peculiarities of Vocaloid.
3. Yamaha Vocaloid is in fact a special synthesizer with much more
complicated control parameters, different from these of ordinary synthesizers.
Vocaloid parameters can be regarded as purely acoustic and are not meant to be
user friendly. So by increasing Amplitude of Resonance at will you will get
output overload while that could be avoided if a change in one parameter was
automatically compensated by the change in another. Since all Vocaloid
parameters are closely interconnected we do not give exact parameter values
necessary to achieve certain effect, we only point in the direction for their
How can I export Vocaloid User Dictionary
that I have created to another PC (e.g. to share it with a friend), or import
someone's user dictionary to my PC?
User created dictionary is saved as a <file name>.udc file in the folder UDIC in
VOCALOID directory of your PROGRAM FILES. You just have to copy this file to any
storage device such as flash memory (the file size is relatively small so you
can even email it) and then paste it to the corresponding UDIC folder on another
PC. Note that UDIC folder is normally hidden, so to see it you have to choose
option SHOW HIDDEN FILES.
Why does the result have little in common with live singing?
Indeed, unless you add Vibrato, Attack and other "expressive elements",
Vocaloid sounds very far from natural. In fact, without introducing "expressive
elements" it only provides "correct static synthesis of vocal speech", i.e.
correct melody and correct speech. However:
1) No vocalist in the world can produce a static sound. An amateur singer will
have plenty of instabilities and distortions in pitch, volume and spectrum. A
professional will also have a lot of instabilities, but they will be the
aforementioned "expressive elements", which "correct static synthesis of vocal
2) Every professional singer and even an amateur vocalist possesses his or her
own unique performing nuances. Acoustic parameters-wise, they are coordinated
changes in Pitch, Spectrum, Volume accents and breathing, i.e., it is not a
synthesis of a static sound. Besides, very often these changes are insignificant
and remain unnoticed by an unskilled ear. However, as in cooking, a tiny amount
of spice may go unnoticed but without it the food seems less delicious.
3) "Expressive elements" are more important for "creating" a performer than his
initial samples/spectrum. We would never recognize Tom Jones, Mariah Cary, Sting
or Celine Dion, should their voices be deprived of the "expressive elements"
while static samples/spectrum remained intact. Generally speaking, the result
you get in Vocaloid without Vibrato, Attack and other additions is the
illustrative example of this.
What does "Resonances" mean?
With the reproduction (synthesis) of different vowels human vocal cords
produce perfectly identical oscillations, i.e. the waves for A, E, I, O, U are
the same but due to the filtration of these waves by speech resonators the
vowels get their characteristic nuance. We may consider the vowels in Vocaloid
as waves already subjected to this filtration and Resonances as additional
filters, allowing to correct the sounding nuance. Vocaloid has four such filters
(speech resonators) with frequencies from about 350 to 3500 Hz. The first one is
the lowest and the fourth one is the highest. Every filter has three parameters:
Frequency, Width and Amplitude. Those familiar with classic Synthesizers like
Moog, can imagine instead of one reconfigurable filter (so called Moog-filter or
Bandpass filter) having four reconfigurable filters. Increasing the Frequency
value leads to the rise of frequency and accordingly decreasing this value leads
to its reduction.
Increasing the Width value leads to the broadening of bandwidth (i. e. more
harmonies will get in the active zone of the filter) and accordingly, decreasing
it will lead to the contraction of bandwidth.
Increasing the Amplitude value allows more harmonies to get in the active zone
of the filter, and accordingly, decreasing it will reduce the number of
By changing Resonances parameters you can sufficiently change the character of
sounding down to achieving special effects such as "Wau-Wau", "Robot Voice",
"Throat Singing" and so forth.
How can I get a sounding nuance, characteristic of professional academic
singers, i.e. "pressure singing"?
Simply put, we have two fundamentally
different ways of singing:
a) strong pressure singing
b) breathy singing (with weak pressure).
Acoustics-wise, "strong pressure" shows in the predominance of "high singing
formant" of about 2500 Hz. In fact, academic manner of singing leads to the
distortion of vowels when compared to ordinary speech, i.e. to get such effect
in Vocaloid you have to increase the value of Amplitude of Resonances 3 è 4,
simultaneously decreasing the value of Width for these Resonances. Note that
intelligibility here will decrease, as it happens in real academic singing. Also
take into account that the same presets do not give satisfactory result for
different ranges. Accordingly, they must be corrected for medium-high and low
registers. (You can try the settings used for our example "The Phantom of the
Opera" at the link
How can I get a sounding nuance, characteristic of Pop and Soul singers, i.e.
This effect is more difficult to achieve with LEON library, though perhaps
the initial material was not recorded in "strong pressure singing" manner. You
have to decrease the value of Amplitude of Resonances 3 è 4, simultaneously
increasing the value of Width (which will lead to expanding the band) for these
very Resonances. You should also increase the value of Noise and Gender Factor.
Also take into account that the same presets do not give satisfactory result for
different ranges. Accordingly, they must be corrected for medium-high and low
registers. (You can try the settings used for our example "Touch Me Lola" at the
How can I make Vocaloid sing a vowel in
one-syllable words ending in a consonant like MAN or SUN on several notes?
Write the word on the first note and add a hyphen to the end of the word. With
every proceeding note write a hyphen (-) above it then on the last note in the
melisma write a forward slash (/) above it. For example "sun" would be written
sun- - - - / Alternatively, for the words that are not in Vocaloid's dictionary,
you can enter phonemes with [sV] on the first syllable, [V] repeated on all the
following syllables and [Vn] on the closing syllable.
You can also solve this task using Pitch drawing, however in this case the sound
will acquire certain spectral distortions similar to the ones introduced to the
real voice by Autotune Tools.
Can I make Vocaloid sing in a language
other than English or Japanese?
That is possible, at least for European languages; however, due to the
difference in phonemes of different languages you will unavoidably get foreign
accent, the more phonemic difference the stronger an accent. The end result will
remind a native song performed by a foreign artist not actually understanding
the lyrics he or she sings, as it sometimes happens. To do that you will need to
enter each phoneme manually choosing from Vocaloid Phoneme Editor the one that
sounds closest to your language phoneme. Better try to avoid entering similar
sounding English words in the Lyrics view (e.g. tall car instead of the Russian
word tolko (just)) as this will only increase the accent. That will
require a lot of experimenting, and often compromise, but the result can be
rewarding though sometimes you will need a sense of humor to fully appreciate
it. You can listen to Vocaloid singing in Russian here (
Why is volume more affected by Harmonics than by spectral frequency contents?
Indeed, changes in Harmonics lead to proportional changes in Volume, and this
parameter can be used to change Volume. To change spectral frequency contents
you have to, simultaneously with increasing/decreasing harmonics, increase
/decrease the value of amplitude of Resonances.
How can I increase intelligibility?
In general, the intelligibility depends on
Volume balance between consonants and vowels. To shift this balance towards
consonants you can increase the value of Noise parameter in the beginning and in
the end of a syllable (a note), where the consonants are usually situated, and
also decrease the value of Amplitude of Resonances in the same positions.
Why are Pitch changes in Vocaloid not as noticeable as in ordinary synthesizer?
The point is that Pitch parameter as a tone changing instrument can influence
the sound that has clearly distinguishable pitch, i.e. the vowels in case with Vocaloid. Pitch influences consonants at much less extent. And since most
syllables include both vowels and consonants, Pitch doesn't affect the
consonants zone. So, consonants conceal the influence of Pitch.
Why do short notes sound in staccato manner?
Normally, the tempo of speech (and
singing) doesn't affect the length of consonants so strongly as it does with
vowels, i.e. in a short syllable consonants will sound relatively longer than in
the same syllable, only sung on a longer note. By creating a succession of short
notes in a vocal, you leave little space for vowels which are essentially
perceived as sung notes. That is why between these consonants (at the interfaces
between notes) there are places with no tone, which creates staccato-like
By the way, it seems that Vocaloid has some more serious restrictions there, for
example, if you try to get four 1/16 notes with word STRANGE (on legato in a
raw) in Tempo=80, it will sing only the first and the second of them sounding
quite correctly, but the 3rd and the 4th notes will not sound at all.
Which acoustic parameters are changed with adding "Vibrato"?
"Vibrato" in Vocaloid is realized as a "complex object" which includes periodical modulation
not only in frequency, but in Volume and Spectrum as well. In that sense
Vocaloid Vibrato is closer to a real vibrato of a live vocalist, than to
Modulation parameter of ordinary synthesizers, when you choose Pitch Modulation,
Amplitude Modulation or Spectrum Filter Frequency Modulation.
Which acoustic parameters are changed with adding "Attack"?
Attack" in Vocaloid
is realized as a "complex object" which includes non-periodical modulation not
only in frequency but in Volume and Spectrum as well, which is close to real
accents of a live vocalist defining stressed sounds, as well as characteristic
techniques, such as à microapproach to a note or its melisma (mordent etc.).