์Œ์„ฑ ํ•ฉ์„ฑ ๋งˆํฌ์—… ์–ธ์–ด(SSML)

์Œ์„ฑ ํ•ฉ์„ฑ ๋งˆํฌ์—… ์–ธ์–ด(SSML)๋ฅผ Text-to-Speech ์š”์ฒญ์œผ๋กœ ์ „์†กํ•˜์—ฌ ๋‘๋ฌธ์ž์–ด, ๋‚ ์งœ, ์‹œ๊ฐ„, ์•ฝ์–ด ๋˜๋Š” ๊ฒ€์—ด๋˜์–ด์•ผ ํ•˜๋Š” ํ…์ŠคํŠธ์˜ ์˜ค๋””์˜ค ํ˜•์‹๊ณผ ๋Š์–ด ์ฝ๊ธฐ์— ๋Œ€ํ•œ ์„ธ๋ถ€์ •๋ณด๋ฅผ ์ž…๋ ฅํ•ด ์˜ค๋””์˜ค ์‘๋‹ต์„ ์ƒ์„ธํ•˜๊ฒŒ ๋งž์ถค์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ๊ณผ ์ฝ”๋“œ ์ƒ˜ํ”Œ์€ Text-to-Speech SSML ํŠœํ† ๋ฆฌ์–ผ์„ ์ฐธ์กฐํ•˜์„ธ์š”.

๋‹ค์Œ์€ SSML ๋งˆํฌ์—…์˜ ์˜ˆ์‹œ๋กœ, ์ด ํ…์ŠคํŠธ๋ฅผ Text-to-Speech๋กœ ํ•ฉ์„ฑํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

<speak>
  Here are <say-as interpret-as="characters">SSML</say-as> samples.
  I can pause <break time="3s"/>.
  I can play a sound
  <audio src="https://www.example.com/MY_MP3_FILE.mp3">didn't get your MP3 audio file</audio>.
  I can speak in cardinals. Your number is <say-as interpret-as="cardinal">10</say-as>.
  Or I can speak in ordinals. You are <say-as interpret-as="ordinal">10</say-as> in line.
  Or I can even speak in digits. The digits for ten are <say-as interpret-as="characters">10</say-as>.
  I can also substitute phrases, like the <sub alias="World Wide Web Consortium">W3C</sub>.
  Finally, I can speak a paragraph with two sentences.
  <p><s>This is sentence one.</s><s>This is sentence two.</s></p>
</speak>

๋‹ค์Œ์€ SSML ์˜ˆ์‹œ ๋ฌธ์„œ๋ฅผ ํ•ฉ์„ฑํ•œ ํ…์ŠคํŠธ์ž…๋‹ˆ๋‹ค.

Here are S S M L samples. I can pause [3 second pause]. I can play a sound [audio file plays].
I can speak in cardinals. Your number is ten.
Or I can speak in ordinals. You are tenth in line.
Or I can even speak in digits. The digits for ten are one oh.
I can also substitute phrases, like the World Wide Web Consortium.
Finally, I can speak a paragraph with two sentences. This is sentence one. This is sentence two.

Text-to-Speech๋Š” ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ SSML ํƒœ๊ทธ์˜ ์ผ๋ถ€๋ฅผ ์ง€์›ํ•˜๋ฉฐ ์ด ์ฃผ์ œ์—์„œ๋Š” ์ง€์›๋˜๋Š” ํƒœ๊ทธ๋ฅผ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

Text-to-Speech๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ SSML ์ž…๋ ฅ์œผ๋กœ ์˜ค๋””์˜ค ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์Œ์„ฑ ์˜ค๋””์˜ค ํŒŒ์ผ ๋งŒ๋“ค๊ธฐ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

SSML ์‚ฌ์šฉ ํŒ

๊ตฌํ˜„์— ๋”ฐ๋ผ Text-to-Speech๋กœ ๋ณด๋‚ด๋Š” SSML ํŽ˜์ด๋กœ๋“œ์—์„œ ๋”ฐ์˜ดํ‘œ๋ฅผ ์ด์Šค์ผ€์ดํ”„ ์ฒ˜๋ฆฌํ•ด์•ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ ์˜ˆ์‹œ์—์„œ๋Š” JSON ๊ฐ์ฒด ๋‚ด์— ํฌํ•จ๋œ SSML ์ž…๋ ฅ ํ˜•์‹ ์ง€์ • ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

"{
    'input':{
     'ssml':'<speak>The <say-as interpret-as=\"characters\">SSML</say-as>
          standard <break time=\"1s\"/>is defined by the
          <sub alias=\"World Wide Web Consortium\">W3C</sub>.</speak>'
    },
    'voice':{
      'languageCode':'en-us',
      'name':'en-US-Standard-B',
      'ssmlGender':'MALE'
    },
    'audioConfig':{
      'audioEncoding':'MP3'
    }
  }"

์˜ˆ์•ฝ ๋ฌธ์ž

์˜ค๋””์˜ค๋กœ ๋ณ€ํ™˜ํ•  ํ…์ŠคํŠธ์— SSML ์˜ˆ์•ฝ ๋ฌธ์ž๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ๋งˆ์„ธ์š”. SSML ์˜ˆ์•ฝ ๋ฌธ์ž๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ ์ด์Šค์ผ€์ดํ”„ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌธ์ž๊ฐ€ ์ฝ”๋“œ๋กœ ์ฝํ˜€์ง€์ง€ ์•Š๋„๋ก ๋ฐฉ์ง€ํ•˜์„ธ์š”. ๋‹ค์Œ ํ‘œ์—์„œ๋Š” ์˜ˆ์•ฝ๋œ SSML ๋ฌธ์ž์™€ ์—ฐ๊ด€๋œ ์ด์Šค์ผ€์ดํ”„ ์ฝ”๋“œ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๋ฌธ์ž ์ด์Šค์ผ€์ดํ”„ ์ฝ”๋“œ
" &quot;
& &amp;
' &apos;
< &lt;
> &gt;

์Œ์„ฑ ์„ ํƒ

VoiceSelectionParams ๊ฐ์ฒด์— ์Œ์„ฑ์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. VoiceSelectionParams ๊ฐ์ฒด ์‚ฌ์šฉ์„ ๋ณด์—ฌ์ฃผ๋Š” ์ฝ”๋“œ ์ƒ˜ํ”Œ์„ ๋ณด๋ ค๋ฉด Text-to-Speech SSML ๊ฐ€์ด๋“œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

<voice> ํƒœ๊ทธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์—ฌ๋Ÿฌ ์Œ์„ฑ์œผ๋กœ SSML์„ ์ฝ์„ ์ˆ˜ ์žˆ์ง€๋งŒ VoiceSelectionParams ์ด๋ฆ„์„ ํ˜ธํ™˜๋˜๋Š” ์Œ์„ฑ์œผ๋กœ ์„ค์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์š”์ฒญ๋œ ์Œ์„ฑ ์œ ํ˜• <voice> ํƒœ๊ทธ์—์„œ ์ง€์›๋˜๋Š” ์Œ์„ฑ ์œ ํ˜•
Neural2 Studio Wavenet ๋‰ด์Šค Standard
Neural2 โœ” โœ” โœ”
Studio โœ” โœ” โœ”
Wavenet โœ” โœ” โœ”
Standard โœ” โœ” โœ”
๋‰ด์Šค โœ” โœ” โœ”

SSML ์š”์†Œ ์ง€์›

๋‹ค์Œ ์„น์…˜์—์„œ๋Š” ์ž‘์—…์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” SSML ์š”์†Œ์™€ ์˜ต์…˜์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

<speak>

SSML ์‘๋‹ต์˜ ๋ฃจํŠธ ์š”์†Œ.

speak ์š”์†Œ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ W3 ์‚ฌ์–‘์„ ์ฐธ์กฐํ•˜์„ธ์š”.

์˜ˆ

<speak>
  my SSML content
</speak>

<break>

๋‹จ์–ด ์‚ฌ์ด์˜ ๋Š์–ด ์ฝ๊ธฐ ๋˜๋Š” ๊ธฐํƒ€ ์šด์œจ์  ๊ฒฝ๊ณ„๋ฅผ ์ œ์–ดํ•˜๋Š” ๋นˆ ์š”์†Œ. ํ† ํฐ ์Œ ๊ฐ„์— <break> ์‚ฌ์šฉ์€ ์„ ํƒ์‚ฌํ•ญ์ž…๋‹ˆ๋‹ค. ์ด ์š”์†Œ๊ฐ€ ๋‹จ์–ด ์‚ฌ์ด์— ์—†์œผ๋ฉด ์Œ์„ฑ ์ค‘์ง€๋Š” ์–ธ์–ด์  ๋งฅ๋ฝ์— ๋”ฐ๋ผ ์ž๋™์œผ๋กœ ๊ฒฐ์ •๋ฉ๋‹ˆ๋‹ค.

break ์š”์†Œ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ W3 ์‚ฌ์–‘์„ ์ฐธ์กฐํ•˜์„ธ์š”.

์†์„ฑ

์†์„ฑ ์„ค๋ช…
time

์ดˆ๋‚˜ ๋ฐ€๋ฆฌ์ดˆ ๋‹จ์œ„๋กœ ์Œ์„ฑ ์ค‘์ง€์˜ ๊ธธ์ด๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค(์˜ˆ: '3s' ๋˜๋Š” '250ms').

strength

์ถœ๋ ฅ ์Œ์„ฑ์˜ ์šด์œจ์  ์ค‘์ง€์˜ ๊ฐ•๋„๋ฅผ ์ƒ๋Œ€์  ์šฉ์–ด๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์œ ํšจํ•œ ๊ฐ’์€ 'x-weak', 'weak', 'medium', 'strong', 'x-strong'์ž…๋‹ˆ๋‹ค. 'none' ๊ฐ’์€ ์šด์œจ์  ์ค‘์ง€ ๊ฒฝ๊ณ„๊ฐ€ ์ถœ๋ ฅ๋˜์–ด์„œ๋Š” ์•ˆ ๋จ์„ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ์„ค์ •๋œ ๊ฐ’์ด ์—†์„ ๊ฒฝ์šฐ ํ”„๋กœ์„ธ์„œ๊ฐ€ ์ƒ์„ฑํ•˜๋Š” ์šด์œจ์  ์ค‘์ง€๋ฅผ ๋ฐฉ์ง€ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ๊ฐ’์€ ํ† ํฐ ์‚ฌ์ด์˜ ์ ์ฆ์ ์ธ(๋‹จ์กฐ ๋น„๊ฐ์†Œ) ์ค‘์ง€ ๊ฐ•๋„๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ์ค‘์ง€ ๊ฒฝ๊ณ„๊ฐ€ ๋šœ๋ ทํ•˜๋ฉด ๋Š์–ด ์ฝ๊ธฐ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.

์˜ˆ

๋‹ค์Œ ์˜ˆ์‹œ๋Š” <break> ์š”์†Œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹จ๊ณ„๊ฐ„์— ์ผ์‹œ ์ค‘์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

<speak>
  Step 1, take a deep breath. <break time="200ms"/>
  Step 2, exhale.
  Step 3, take a deep breath again. <break strength="weak"/>
  Step 4, exhale.
</speak>

<sayโ€‘as>

์ด ์š”์†Œ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์š”์†Œ ๋‚ด์— ํฌํ•จ ๋œ ํ…์ŠคํŠธ ๊ตฌ๋ฌธ ์œ ํ˜•์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ํฌํ•จ๋œ ํ…์ŠคํŠธ ๋ Œ๋”๋ง์˜ ์„ธ๋ถ€ ์ˆ˜์ค€์„ ์ง€์ •ํ•˜๋Š” ๋ฐ๋„ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.

<sayโ€‘as> ์š”์†Œ์—๋Š” ๊ฐ’์„ ๋งํ•˜๋Š” ํ˜•์‹์„ ๊ฒฐ์ •ํ•˜๋Š” ํ•„์ˆ˜ ์†์„ฑ์ธ interpret-as๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ํŠน์ • interpret-as ๊ฐ’์— ๋”ฐ๋ผ ์„ ํƒ์  ์†์„ฑ ์ธ format ๋ฐ detail์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์˜ˆ์‹œ

interpret-as ์†์„ฑ์€ ๋‹ค์Œ ๊ฐ’์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

  • currency

    ๋‹ค์Œ ์˜ˆ์‹œ๋Š” 'forty two dollars and one cent'๋ผ๊ณ  ๋งํ•ฉ๋‹ˆ๋‹ค. ์–ธ์–ด ์†์„ฑ์ด ๋ˆ„๋ฝ๋œ ๊ฒฝ์šฐ ํ˜„์žฌ ์–ธ์–ด๊ฐ€ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

    <speak>
      <say-as interpret-as='currency' language='en-US'>$42.01</say-as>
    </speak>
        
  • telephone

    W3C SSML 1.0 say-as attribute values WG ๋ฉ”๋ชจ์—์„œ interpret-as='telephone' ์„ค๋ช…์„ ์ฐธ์กฐํ•˜์„ธ์š”.

    ๋‹ค์Œ ์˜ˆ์‹œ๋Š” '18002021212'๋กœ ๋งํ•ฉ๋‹ˆ๋‹ค. 'google:style' ์†์„ฑ์ด ์ƒ๋žต๋œ ๊ฒฝ์šฐ ๋ฌธ์ž O๊ฐ€ 0์œผ๋กœ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

    'google:style='zero-as-zero' ์†์„ฑ์€ ํ˜„์žฌ EN ์–ธ์–ด๋กœ๋งŒ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

          <speak>
            <say-as interpret-as='telephone' google:style='zero-as-zero'>1800-202-1212</say-as>
          </speak>
        
  • verbatim ๋˜๋Š” spell-out

    ๋‹ค์Œ ์˜ˆ๋Š” ํ•œ ๊ธ€์ž์”ฉ ์ฒ ์ž๋ฅผ ๋งํ•ฉ๋‹ˆ๋‹ค.

    <speak>
      <say-as interpret-as="verbatim">abcdefg</say-as>
    </speak>
        
  • date

    format ์†์„ฑ์€ ์ผ๋ จ์˜ ๋‚ ์งœ ํ•„๋“œ ๋ฌธ์ž ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค. format์—์„œ ์ง€์›๋˜๋Š” ํ•„๋“œ ๋ฌธ์ž ์ฝ”๋“œ๋Š” ๊ฐ๊ฐ ์—ฐ๋„, ์›”, ์ผ์— ํ•ด๋‹นํ•˜๋Š” {y, m, d}์ž…๋‹ˆ๋‹ค. ์—ฐ๋„, ์›”, ์ผ์— ํ•„๋“œ ์ฝ”๋“œ๊ฐ€ ํ•œ ๋ฒˆ์”ฉ ํ‘œ์‹œ๋  ๊ฒฝ์šฐ ์˜ˆ์ƒ ์ž๋ฆฟ์ˆ˜๋Š” ๊ฐ๊ฐ 4์ž๋ฆฌ, 2์ž๋ฆฌ, 2์ž๋ฆฌ์ž…๋‹ˆ๋‹ค. ํ•„๋“œ ์ฝ”๋“œ๊ฐ€ ๋ฐ˜๋ณต๋  ๊ฒฝ์šฐ ์˜ˆ์ƒ ์ž๋ฆฟ์ˆ˜๋Š” ์ฝ”๋“œ์˜ ๋ฐ˜๋ณต ํšŸ์ˆ˜์ž…๋‹ˆ๋‹ค. ๋‚ ์งœ ํ…์ŠคํŠธ์˜ ํ•„๋“œ๋Š” ๊ตฌ๋‘์  ๋˜๋Š” ๊ณต๋ฐฑ์œผ๋กœ ๊ตฌ๋ถ„๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    detail ์†์„ฑ์€ ๋‚ ์งœ๋ฅผ ์ฝ๋Š” ๋ฐฉ์‹์„ ์ œ์–ดํ•ฉ๋‹ˆ๋‹ค. detail='1'์˜ ๊ฒฝ์šฐ ์›”์ด๋‚˜ ์—ฐ๋„ ํ•„๋“œ ์ค‘ ํ•˜๋‚˜์™€ ์ผ ํ•„๋“œ๊ฐ€ ํ•„์ˆ˜ ํ•ญ๋ชฉ์ด์ง€๋งŒ ์›”๊ณผ ์—ฐ๋„ ํ•„๋“œ ๋‘˜ ๋‹ค ์ž…๋ ฅํ•ด๋„ ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” 3๊ฐœ ๋ฏธ๋งŒ์˜ ํ•„๋“œ๊ฐ€ ์ง€์ •๋  ๊ฒฝ์šฐ ๊ธฐ๋ณธ๊ฐ’์ž…๋‹ˆ๋‹ค. ๋งํ•˜๋Š” ํ˜•์‹์€ '{๋ช‡ ์›”}์˜ {๋ฉฐ์น ์งธ ๋‚ }, {๋ช‡ ๋…„๋„}'(์˜์–ด ๊ธฐ์ค€)์ž…๋‹ˆ๋‹ค.

    ๋‹ค์Œ ์˜ˆ๋Š” '9์›”์˜ ์—ด์งธ ๋‚ , 1960๋…„' ํ˜•์‹์œผ๋กœ ๋งํ•ฉ๋‹ˆ๋‹ค.

    <speak>
      <say-as interpret-as="date" format="yyyymmdd" detail="1">
        1960-09-10
      </say-as>
    </speak>
        

    ๋‹ค์Œ ์˜ˆ๋Š” '9์›”์˜ ์—ด์งธ ๋‚ ' ํ˜•์‹์œผ๋กœ ๋งํ•ฉ๋‹ˆ๋‹ค.

    <speak>
      <say-as interpret-as="date" format="dm">10-9</say-as>
    </speak>
        

    detail='2'์˜ ๊ฒฝ์šฐ ์ผ, ์›”, ์—ฐ๋„ ํ•„๋“œ๊ฐ€ ํ•„์š”ํ•˜๋ฉฐ, ์„ธ ํ•„๋“œ๊ฐ€ ๋ชจ๋‘ ์ž…๋ ฅ๋œ ๊ฒฝ์šฐ ๊ธฐ๋ณธ๊ฐ’์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค. ๋งํ•˜๋Š” ํ˜•์‹์€ '{๋ช‡ ์›”} {๋ฉฐ์น ์งธ ๋‚ }, {๋ช‡ ๋…„๋„}'(์˜์–ด ๊ธฐ์ค€)์ž…๋‹ˆ๋‹ค.

    ๋‹ค์Œ ์˜ˆ๋Š” '9์›” ์—ด์งธ ๋‚ , 1960๋…„' ํ˜•์‹์œผ๋กœ ๋งํ•ฉ๋‹ˆ๋‹ค.

    <speak>
      <say-as interpret-as="date" format="dmy" detail="2">
        10-9-1960
      </say-as>
    </speak>
        
  • characters

    ๋‹ค์Œ ์˜ˆ๋Š” ํ•œ ๊ธ€์ž์”ฉ ๋งํ•ฉ๋‹ˆ๋‹ค.

    <speak>
      <say-as interpret-as="characters">can</say-as>
    </speak>
        
  • cardinal

    ๋‹ค์Œ ์˜ˆ์‹œ๋Š” ์ˆซ์ž๋ฅผ ๊ธฐ์ˆ˜ ํ˜•์‹์œผ๋กœ ๋งํ•ฉ๋‹ˆ๋‹ค.

    <speak>
      <say-as interpret-as="cardinal">12345</say-as>
    </speak>
        
  • ordinal

    ๋‹ค์Œ ์˜ˆ๋Š” ์ˆซ์ž๋ฅผ ์„œ์ˆ˜ ํ˜•์‹์œผ๋กœ ๋งํ•ฉ๋‹ˆ๋‹ค.

    <speak>
      <say-as interpret-as="ordinal">1</say-as>
    </speak>
        
  • fraction

    ๋‹ค์Œ ์˜ˆ๋Š” ์ˆซ์ž๋ฅผ ๋ถ„์ˆ˜ ํ˜•์‹์œผ๋กœ ๋งํ•ฉ๋‹ˆ๋‹ค.

    <speak>
      <say-as interpret-as="fraction">5+1/2</say-as>
    </speak>
        
  • expletive ๋˜๋Š” bleep

    ๋‹ค์Œ ์˜ˆ์‹œ๋Š” ํ…์ŠคํŠธ๊ฐ€ ๊ฒ€์—ด๋œ ๊ฒƒ์ฒ˜๋Ÿผ ์‚ ์†Œ๋ฆฌ๊ฐ€ ๋‚ฉ๋‹ˆ๋‹ค.

    <speak>
      <say-as interpret-as="expletive">censor this</say-as>
    </speak>
        
  • unit

    ์ˆซ์ž์— ๋”ฐ๋ผ ๋‹จ์œ„๋ฅผ ๋‹จ์ˆ˜ ๋˜๋Š” ๋ณต์ˆ˜๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ ์˜ˆ๋Š” ๋‹จ์ˆ˜ํ˜• ๋‹จ์œ„๋ฅผ ๋ณต์ˆ˜ํ˜•์œผ๋กœ ๋งํ•ฉ๋‹ˆ๋‹ค.

    <speak>
      <say-as interpret-as="unit">10 foot</say-as>
    </speak>
        
  • time

    ๋‹ค์Œ ์˜ˆ๋Š” '2์‹œ 30๋ถ„ P.M.' ํ˜•์‹์œผ๋กœ ๋งํ•ฉ๋‹ˆ๋‹ค.

    <speak>
      <say-as interpret-as="time" format="hms12">2:30pm</say-as>
    </speak>
        

    format ์†์„ฑ์€ ์ผ๋ จ์˜ ์‹œ๊ฐ„ ํ•„๋“œ ๋ฌธ์ž ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค. format์—์„œ ์ง€์›๋˜๋Š” ํ•„๋“œ ๋ฌธ์ž ์ฝ”๋“œ๋Š” ๊ฐ๊ฐ ์‹œ๊ฐ„, ๋ถ„, ์ดˆ, ์‹œ๊ฐ„๋Œ€, 12์‹œ๊ฐ„์ œ, 24์‹œ๊ฐ„์ œ์— ํ•ด๋‹นํ•˜๋Š” {h, m, s, Z, 12, 24}์ž…๋‹ˆ๋‹ค. ์‹œ๊ฐ„, ๋ถ„, ์ดˆ์— ํ•„๋“œ ์ฝ”๋“œ๊ฐ€ ํ•œ ๋ฒˆ์”ฉ ํ‘œ์‹œ๋  ๊ฒฝ์šฐ ์˜ˆ์ƒ ์ž๋ฆฟ์ˆ˜๋Š” ๊ฐ๊ฐ 1์ž๋ฆฌ, 2์ž๋ฆฌ, 2์ž๋ฆฌ์ž…๋‹ˆ๋‹ค. ํ•„๋“œ ์ฝ”๋“œ๊ฐ€ ๋ฐ˜๋ณต๋  ๊ฒฝ์šฐ ์˜ˆ์ƒ ์ž๋ฆฟ์ˆ˜๋Š” ์ฝ”๋“œ์˜ ๋ฐ˜๋ณต ํšŸ์ˆ˜์ž…๋‹ˆ๋‹ค. ์‹œ๊ฐ„ ํ…์ŠคํŠธ์˜ ํ•„๋“œ๋Š” ๊ตฌ๋‘์  ๋˜๋Š” ๊ณต๋ฐฑ์œผ๋กœ ๊ตฌ๋ถ„๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‹œ๊ฐ„, ๋ถ„, ์ดˆ๊ฐ€ ์ด ํ˜•์‹์œผ๋กœ ์ง€์ •๋˜์ง€ ์•Š๊ฑฐ๋‚˜ ์ผ์น˜ํ•˜๋Š” ์ž๋ฆฟ์ˆ˜๊ฐ€ ์—†์„ ๊ฒฝ์šฐ ํ•„๋“œ ๊ฐ’์ด 0์œผ๋กœ ์ทจ๊ธ‰๋ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ format์€ 'hms12'์ž…๋‹ˆ๋‹ค.

    detail ์†์„ฑ์€ ์‹œ๊ฐ„์„ ๋งํ•˜๋Š” ํ˜•์‹์„ 12์‹œ๊ฐ„์ œ ๋˜๋Š” 24์‹œ๊ฐ„์ œ๋กœ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. detail='1' ๋˜๋Š”detail์ด ๋ˆ„๋ฝ๋˜์–ด ์žˆ๊ณ  ์‹œ๊ฐ„ ํ˜•์‹์ด 24์‹œ๊ฐ„์ธ ๊ฒฝ์šฐ ๋งํ•˜๋Š” ํ˜•์‹์€ 24์‹œ๊ฐ„์ œ์ž…๋‹ˆ๋‹ค. detail='2' ๋˜๋Š” detail์ด ๋ˆ„๋ฝ๋˜์–ด ์žˆ๊ณ  ์‹œ๊ฐ„ ํ˜•์‹์ด 12์‹œ๊ฐ„์ธ ๊ฒฝ์šฐ ๋งํ•˜๋Š” ํ˜•์‹์€ 12์‹œ๊ฐ„์ œ์ž…๋‹ˆ๋‹ค.

say-as ์š”์†Œ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ W3 ์‚ฌ์–‘์„ ์ฐธ์กฐํ•˜์„ธ์š”.

<audio>

ํ•ฉ์„ฑ๋œ ์Œ์„ฑ ์ถœ๋ ฅ๊ณผ ํ•จ๊ป˜ ๋…น์Œ๋œ ์˜ค๋””์˜ค ํŒŒ์ผ์˜ ์‚ฝ์ž…๊ณผ ๊ธฐํƒ€ ์˜ค๋””์˜ค ํ˜•์‹์˜ ์‚ฝ์ž…์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

์†์„ฑ

์†์„ฑ ํ•„์ˆ˜ ๊ธฐ๋ณธ๊ฐ’ ๊ฐ’
src ์˜ˆ ํ•ด๋‹น ์‚ฌํ•ญ ์—†์Œ ์˜ค๋””์˜ค ๋ฏธ๋””์–ด ์†Œ์Šค๋ฅผ ์ฐธ์กฐํ•˜๋Š” URI. ์ง€์›๋˜๋Š” ํ”„๋กœํ† ์ฝœ์€ https์ž…๋‹ˆ๋‹ค.
clipBegin ์•„๋‹ˆ์š” 0 ์žฌ์ƒ ์‹œ์ž‘ ์ง€์ ์„ ๊ฒฐ์ •ํ•˜๋ฉฐ, ์˜ค๋””์˜ค ์†Œ์Šค์˜ ์‹œ์ž‘ ๋ถ€๋ถ„์— ์‚ฝ์ž…๋˜๋Š” ์˜คํ”„์…‹ ๊ฐ’์ธ TimeDesignation. ์ด ๊ฐ’์ด ์˜ค๋””์˜ค ์†Œ์Šค์˜ ์‹ค์ œ ์ง€์† ์‹œ๊ฐ„๋ณด๋‹ค ํฌ๊ฑฐ๋‚˜ ๊ฐ™์„ ๊ฒฝ์šฐ ์˜ค๋””์˜ค๊ฐ€ ์‚ฝ์ž…๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
clipEnd ์•„๋‹ˆ์š” ๋ฌดํ•œ๋Œ€ ์žฌ์ƒ ์ข…๋ฃŒ ์ง€์ ์„ ๊ฒฐ์ •ํ•˜๋ฉฐ, ์˜ค๋””์˜ค ์†Œ์Šค์˜ ์‹œ์ž‘ ๋ถ€๋ถ„์— ์‚ฝ์ž…๋˜๋Š” ์˜คํ”„์…‹ ๊ฐ’์ธ TimeDesignation. ์˜ค๋””์˜ค ์†Œ์Šค์˜ ์‹ค์ œ ์ง€์† ์‹œ๊ฐ„์ด ์ด ๊ฐ’๋ณด๋‹ค ์ž‘์œผ๋ฉด ์ง€์ •๋œ ์‹œ๊ฐ„์— ์žฌ์ƒ์ด ์ข…๋ฃŒ๋ฉ๋‹ˆ๋‹ค. clipBegin์ด clipEnd๋ณด๋‹ค ํฌ๊ฑฐ๋‚˜ ๊ฐ™์œผ๋ฉด ์˜ค๋””์˜ค๊ฐ€ ์‚ฝ์ž…๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
speed ์•„๋‹ˆ์š” 100% ์ •์ƒ ์ž…๋ ฅ ์†๋„ ๋Œ€๋น„ ์ถœ๋ ฅ ์žฌ์ƒ ์†๋„์˜ ๋น„์œจ์„ ๋ฐฑ๋ถ„์œจ๋กœ ํ‘œํ˜„ํ•œ ๊ฐ’. ์–‘์˜ ์‹ค์ˆ˜ ๋‹ค์Œ์— %๊ฐ€ ์˜ค๋Š” ํ˜•์‹์ž…๋‹ˆ๋‹ค. ํ˜„์žฌ ์ง€์›๋˜๋Š” ๋ฒ”์œ„๋Š” [50%(์ €์† - 0.5๋ฐฐ์†), 200%(๊ณ ์† - 2๋ฐฐ์†)]์ž…๋‹ˆ๋‹ค. ์ด ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚˜๋Š” ๊ฐ’์€ ์ด ๋ฒ”์œ„์— ๋งž๊ฒŒ ์กฐ์ •๋˜๊ฑฐ๋‚˜ ์กฐ์ •๋˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
repeatCount ์•„๋‹ˆ์š” repeatDur์ด ์„ค์ •๋œ ๊ฒฝ์šฐ 1 ๋˜๋Š” 10 clipBegin ๋˜๋Š” clipEnd๋กœ ์ž˜๋ผ๋‚ธ ํ›„ ์˜ค๋””์˜ค๋ฅผ ์‚ฝ์ž…ํ•  ํšŸ์ˆ˜๋ฅผ ์ง€์ •ํ•˜๋Š” ์‹ค์ˆ˜. ์†Œ์ˆ˜ ๋ฐ˜๋ณต์€ ์ง€์›๋˜์ง€ ์•Š์œผ๋ฏ€๋กœ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ •์ˆ˜๋กœ ๊ฐ’์ด ๋ฐ˜์˜ฌ๋ฆผ๋ฉ๋‹ˆ๋‹ค. 0์€ ์œ ํšจํ•œ ๊ฐ’์ด ์•„๋‹ˆ๋ฏ€๋กœ ์ง€์ •๋˜์ง€ ์•Š์€ ๊ฒƒ์œผ๋กœ ์ทจ๊ธ‰๋˜๋ฉฐ ์ด ๊ฒฝ์šฐ ๊ธฐ๋ณธ๊ฐ’์ด ์„ค์ •๋ฉ๋‹ˆ๋‹ค.
repeatDur ์•„๋‹ˆ์š” ๋ฌดํ•œ๋Œ€ ์†Œ์Šค์˜ clipBegin, clipEnd, repeatCount, speed ์†์„ฑ์ด ์ฒ˜๋ฆฌ๋œ ํ›„ ์‚ฝ์ž…๋˜๋Š” ์˜ค๋””์˜ค์˜ ์ง€์† ์‹œ๊ฐ„(์ผ๋ฐ˜์ ์ธ ์žฌ์ƒ ์‹œ๊ฐ„๊ณผ ๋‹ค๋ฆ„)์„ ์ œํ•œํ•˜๋Š” TimeDesignation. ์ฒ˜๋ฆฌ๋œ ์˜ค๋””์˜ค์˜ ์ง€์† ์‹œ๊ฐ„์ด ์ด ๊ฐ’๋ณด๋‹ค ์ž‘์œผ๋ฉด ์ง€์ •๋œ ์‹œ๊ฐ„์— ์žฌ์ƒ์ด ์ข…๋ฃŒ๋ฉ๋‹ˆ๋‹ค.
soundLevel ์•„๋‹ˆ์š” +0dB soundLevel ๋ฐ์‹œ๋ฒจ์„ ์‚ฌ์šฉํ•˜์—ฌ ์˜ค๋””์˜ค์˜ ์‚ฌ์šด๋“œ ๋ ˆ๋ฒจ์„ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ์ตœ๋Œ€ ๋ฒ”์œ„๋Š” +/-40dB์ด์ง€๋งŒ ์‹ค์ œ ๋ฒ”์œ„๋Š” ์‹ค์งˆ์ ์œผ๋กœ ๋” ์ž‘์œผ๋ฉฐ, ์ „์ฒด ๋ฒ”์œ„์—์„œ ์ถœ๋ ฅ ํ’ˆ์งˆ์ด ์ข‹์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹ค์Œ์€ ํ˜„์žฌ ์˜ค๋””์˜ค์— ์ง€์›๋˜๋Š” ์„ค์ •์ž…๋‹ˆ๋‹ค.

  • ํ˜•์‹: MP3(MPEG v2)
    • ์ดˆ๋‹น 24K ์ƒ˜ํ”Œ
    • ์ดˆ๋‹น 24K~96K ๋น„ํŠธ, ๊ณ ์ • ์†๋„
  • ํ˜•์‹: Ogg์˜ Opus
    • ์ดˆ๋‹น 24K ์ƒ˜ํ”Œ(์ดˆ๊ด‘๋Œ€์—ญ)
    • ์ดˆ๋‹น 24K~96K ๋น„ํŠธ, ๊ณ ์ • ์†๋„
  • ํ˜•์‹(์ง€์› ์ค‘๋‹จ๋จ): WAV(RIFF)
    • PCM 16๋น„ํŠธ ๋ถ€ํ˜ธ Little Endian
    • ์ดˆ๋‹น 24K ์ƒ˜ํ”Œ
  • ๋ชจ๋“  ํ˜•์‹์— ํ•ด๋‹น:
    • ๋‹จ์ผ ์ฑ„๋„์ด ๊ถŒ์žฅ๋˜์ง€๋งŒ ์Šคํ…Œ๋ ˆ์˜ค๊ฐ€ ํ—ˆ์šฉ๋ฉ๋‹ˆ๋‹ค.
    • ์ตœ๋Œ€ ์ง€์† ์‹œ๊ฐ„: 240์ดˆ. ์ด๋ณด๋‹ค ์˜ค๋žœ ์‹œ๊ฐ„ ๋™์•ˆ ์˜ค๋””์˜ค๋ฅผ ์žฌ์ƒํ•˜๋ ค๋ฉด ๋ฏธ๋””์–ด ์‘๋‹ต์„ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.
    • ํŒŒ์ผ ํฌ๊ธฐ ์ œํ•œ: 5MB
    • ์†Œ์Šค URL์€ HTTPS ํ”„๋กœํ† ์ฝœ์„ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ค๋””์˜ค๋ฅผ ๊ฐ€์ ธ์˜ฌ ๋•Œ UserAgent๋Š” 'Google-Speech-Actions'์ž…๋‹ˆ๋‹ค.

<audio> ์š”์†Œ์˜ ์ฝ˜ํ…์ธ ๋Š” ์„ ํƒ์‚ฌํ•ญ์ด๋ฉฐ ์˜ค๋””์˜ค ํŒŒ์ผ์„ ์žฌ์ƒํ•  ์ˆ˜ ์—†๊ฑฐ๋‚˜ ์ถœ๋ ฅ ๊ธฐ๊ธฐ๊ฐ€ ์˜ค๋””์˜ค๋ฅผ ์ง€์›ํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ๋‚ด์šฉ์—๋Š” <desc> ์š”์†Œ๊ฐ€ ํฌํ•จ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด ๊ฒฝ์šฐ ํ•ด๋‹น ์š”์†Œ์˜ ํ…์ŠคํŠธ ๋‚ด์šฉ์ด ํ™”๋ฉด์— ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ์‘๋‹ต ์ฒดํฌ๋ฆฌ์ŠคํŠธ์˜ ๋…น์Œ๋œ ์˜ค๋””์˜ค ์„น์…˜์„ ์ฐธ์กฐํ•˜์„ธ์š”.

๋˜ํ•œ src URL์€ https URL์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค(Google Cloud Storage๋Š” https URL์˜ ์˜ค๋””์˜ค ํŒŒ์ผ์„ ํ˜ธ์ŠคํŒ…ํ•  ์ˆ˜ ์žˆ์Œ).

๋ฏธ๋””์–ด ์‘๋‹ต์„ ์ž์„ธํžˆ ์•Œ์•„๋ณด๋ ค๋ฉด ์‘๋‹ต ๊ฐ€์ด๋“œ์˜ ๋ฏธ๋””์–ด ์‘๋‹ต ์„น์…˜์„ ์ฐธ์กฐํ•˜์„ธ์š”.

audio ์š”์†Œ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ W3 ์‚ฌ์–‘์„ ์ฐธ์กฐํ•˜์„ธ์š”.

์˜ˆ

<speak>
  <audio src="cat_purr_close.ogg">
    <desc>a cat purring</desc>
    PURR (sound didn't load)
  </audio>
</speak>

<p>,<s>

๋ฌธ์žฅ๊ณผ ๋‹จ๋ฝ ์š”์†Œ์ž…๋‹ˆ๋‹ค.

p ๋ฐ s ์š”์†Œ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ W3 ์‚ฌ์–‘์„ ์ฐธ์กฐํ•˜์„ธ์š”.

์˜ˆ

<p><s>This is sentence one.</s><s>This is sentence two.</s></p>

๊ถŒ์žฅ์‚ฌํ•ญ

  • ํŠนํžˆ prosody๋ฅผ ๋ณ€๊ฒฝํ•˜๋Š” SSML ์š”์†Œ๊ฐ€ ํฌํ•จ๋œ ๊ฒฝ์šฐ ์ „์ฒด ๋ฌธ์žฅ์„ ๋ž˜ํ•‘ํ•˜๋ ค๋ฉด <s> ... </s> ํƒœ๊ทธ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค(์˜ˆ: <audio>, <break>, <emphasis>, <par>, <prosody>, <say-as>, <seq>, <sub>).
  • ์Œ์„ฑ ๋‚ด ๊ตฌ๋ถ„์„ ๋“ค์„ ์ˆ˜ ์žˆ๋„๋ก ์ถฉ๋ถ„ํ•˜๊ฒŒ ๊ธธ๊ฒŒ ํ•˜๋ ค๋ฉด <s> ... </s> ํƒœ๊ทธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌธ์žฅ ์‚ฌ์ด์— ์ค„ ๋ฐ”๊ฟˆ์„ ์‚ฝ์ž…ํ•ฉ๋‹ˆ๋‹ค.

<sub>

alias ์†์„ฑ ๊ฐ’์˜ ํ…์ŠคํŠธ๋Š” ํฌํ•จ๋œ ํ…์ŠคํŠธ์˜ ๋ฐœ์Œ์„ ๋Œ€์ฒดํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

sub ์š”์†Œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฝ๊ธฐ ์–ด๋ ค์šด ๋‹จ์–ด์˜ ์‰ฌ์šด ๋ฐœ์Œ์„ ์ œ๊ณตํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰ ์˜ˆ์‹œ๋Š” ์ด ์‚ฌ์šฉ ์‚ฌ๋ก€๋ฅผ ์ผ๋ณธ์–ด ๋ฒ„์ „์œผ๋กœ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

sub ์š”์†Œ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ W3 ์‚ฌ์–‘์„ ์ฐธ์กฐํ•˜์„ธ์š”.

์˜ˆ์‹œ

<sub alias="World Wide Web Consortium">W3C</sub>
<sub alias="ใซใฃใฝใ‚“ใฐใ—">ๆ—ฅๆœฌๆฉ‹</sub>

<mark>

ํ…์ŠคํŠธ ๋˜๋Š” ํƒœ๊ทธ ์‹œํ€€์Šค์— ๋งˆ์ปค๋ฅผ ๋ฐฐ์น˜ํ•˜๋Š” ๋นˆ ์š”์†Œ์ž…๋‹ˆ๋‹ค. ์‹œํ€€์Šค์˜ ํŠน์ • ์œ„์น˜๋ฅผ ์ฐธ์กฐํ•˜๊ฑฐ๋‚˜ ๋น„๋™๊ธฐ ์•Œ๋ฆผ์— ์‚ฌ์šฉ๋˜๋Š” ์ถœ๋ ฅ ์ŠคํŠธ๋ฆผ์— ๋งˆ์ปค๋ฅผ ์‚ฝ์ž…ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

mark ์š”์†Œ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ W3 ์‚ฌ์–‘์„ ์ฐธ์กฐํ•˜์„ธ์š”.

์˜ˆ

<speak>
Go from <mark name="here"/> here, to <mark name="there"/> there!
</speak>

<prosody>

์š”์†Œ์— ํฌํ•จ๋œ ํ…์ŠคํŠธ์˜ ๋†’๋‚ฎ์ด, ๋งํ•˜๊ธฐ ์†๋„, ๋ณผ๋ฅจ์„ ๋งž์ถค์„ค์ •ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ํ˜„์žฌ rate, pitch, volume ์†์„ฑ์ด ์ง€์›๋ฉ๋‹ˆ๋‹ค.

W3 ์‚ฌ์–‘์— ๋”ฐ๋ผ rate ๋ฐ volume ์†์„ฑ์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. pitch ์†์„ฑ ๊ฐ’์„ ์„ค์ •ํ•˜๋Š” ๋ฐ ์„ธ ๊ฐ€์ง€ ์˜ต์…˜์ด ์žˆ์Šต๋‹ˆ๋‹ค.

์†์„ฑ ์„ค๋ช…
name

๊ฐ ํ‘œ์‹œ์˜ ๋ฌธ์ž์—ด ID์ž…๋‹ˆ๋‹ค.

์˜ต์…˜ ์„ค๋ช…
์นœ์ฒ™ ์ƒ๋Œ€ ๊ฐ’(์˜ˆ: 'low', 'medium', 'high' ๋“ฑ)์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ 'medium'์€ ๊ธฐ๋ณธ ๋†’๋‚ฎ์ด์ž…๋‹ˆ๋‹ค.
๋ฐ˜์Œ '+Nst' ๋˜๋Š” '-Nst'๋ฅผ ๊ฐ๊ฐ ์‚ฌ์šฉํ•˜์—ฌ 'N' ๋ฐ˜์Œ์”ฉ ๋†’๋‚ฎ์ด๋ฅผ ์˜ฌ๋ฆฌ๊ฑฐ๋‚˜ ๋‚ด๋ฆฝ๋‹ˆ๋‹ค. '+/-' ๋ฐ 'st'๋Š” ํ•„์ˆ˜์ž…๋‹ˆ๋‹ค.
๋น„์œจ '+N%' ๋˜๋Š” '-N%'๋ฅผ ๊ฐ๊ฐ ์‚ฌ์šฉํ•˜์—ฌ 'N' ํผ์„ผํŠธ์”ฉ ๋†’๋‚ฎ์ด๋ฅผ ์˜ฌ๋ฆฌ๊ฑฐ๋‚˜ ๋‚ด๋ฆฝ๋‹ˆ๋‹ค. '%'๋Š” ํ•„์ˆ˜์ด์ง€๋งŒ '+/-'๋Š” ์„ ํƒ์‚ฌํ•ญ์ž…๋‹ˆ๋‹ค.

prosody ์š”์†Œ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ W3 ์‚ฌ์–‘์„ ์ฐธ์กฐํ•˜์„ธ์š”.

์˜ˆ

๋‹ค์Œ ์˜ˆ์‹œ์—์„œ๋Š” <prosody> ์š”์†Œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ •์ƒ๋ณด๋‹ค 2๋ฐ˜์Œ ๋‚ฎ์ถ”์–ด ๋А๋ฆฌ๊ฒŒ ๋งํ•ฉ๋‹ˆ๋‹ค.

<prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>

<emphasis>

์š”์†Œ์— ํฌํ•จ๋œ ํ…์ŠคํŠธ์—์„œ ๊ฐ•์„ธ๋ฅผ ์ถ”๊ฐ€ํ•˜๊ฑฐ๋‚˜ ์ œ๊ฑฐํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. <emphasis> ์š”์†Œ๋Š” <prosody>์™€ ์œ ์‚ฌํ•˜๊ฒŒ ์Œ์„ฑ์„ ์ˆ˜์ •ํ•˜์ง€๋งŒ ๊ฐœ๋ณ„ ์Œ์„ฑ ์†์„ฑ์„ ์„ค์ •ํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค.

์ด ์š”์†Œ๋Š” ๋‹ค์Œ์˜ ์œ ํšจํ•œ ๊ฐ’์„ ์‚ฌ์šฉํ•˜์—ฌ ์„ ํƒ์  'level' ์†์„ฑ์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

  • strong
  • moderate
  • none
  • reduced

emphasis ์š”์†Œ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ W3 ์‚ฌ์–‘์„ ์ฐธ์กฐํ•˜์„ธ์š”.

์˜ˆ

๋‹ค์Œ ์˜ˆ์‹œ์—์„œ๋Š” <emphasis> ์š”์†Œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ณต์ง€ํ•ฉ๋‹ˆ๋‹ค.

<emphasis level="moderate">This is an important announcement</emphasis>

<par>

์—ฌ๋Ÿฌ ๋ฏธ๋””์–ด ์š”์†Œ๋ฅผ ํ•œ ๋ฒˆ์— ์žฌ์ƒํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” ๋ณ‘๋ ฌ ๋ฏธ๋””์–ด ์ปจํ…Œ์ด๋„ˆ์ž…๋‹ˆ๋‹ค. ์œ ์ผํ•˜๊ฒŒ ํ—ˆ์šฉ๋˜๋Š” ์ฝ˜ํ…์ธ ๋Š” <par>, <seq>, <media> ์š”์†Œ ํ•œ ๊ฐœ ์ด์ƒ์œผ๋กœ ๊ตฌ์„ฑ๋œ ์„ธํŠธ์ž…๋‹ˆ๋‹ค. <media> ์š”์†Œ์˜ ์ˆœ์„œ๋Š” ์ค‘์š”ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

ํ•˜์œ„ ์š”์†Œ๊ฐ€ ๋‹ค๋ฅธ ์‹œ์ž‘ ์‹œ๊ฐ„์„ ์ง€์ •ํ•˜์ง€ ์•Š์œผ๋ฉด ์š”์†Œ์˜ ์•”๋ฌต์  ์‹œ์ž‘ ์‹œ๊ฐ„์€ <par> ์ปจํ…Œ์ด๋„ˆ์˜ ์‹œ์ž‘ ์‹œ๊ฐ„๊ณผ ๋™์ผํ•ฉ๋‹ˆ๋‹ค. ํ•˜์œ„ ์š”์†Œ์˜ begin ๋˜๋Š” end ์†์„ฑ์— ์„ค์ •๋œ ์˜คํ”„์…‹ ๊ฐ’์ด ์žˆ์œผ๋ฉด ์š”์†Œ์˜ ์˜คํ”„์…‹์€ <par> ์ปจํ…Œ์ด๋„ˆ์˜ ์‹œ์ž‘ ์‹œ๊ฐ„์„ ๊ธฐ์ค€์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ๋ฃจํŠธ <par> ์š”์†Œ์˜ ๊ฒฝ์šฐ begin ์†์„ฑ์ด ๋ฌด์‹œ๋˜๋ฉฐ, ์‹œ์ž‘ ์‹œ๊ฐ„์€ SSML ์Œ์„ฑ ํ•ฉ์„ฑ ํ”„๋กœ์„ธ์Šค์—์„œ ๋ฃจํŠธ <par> ์š”์†Œ(์ฆ‰, ์‚ฌ์‹ค์ƒ '0' ์‹œ๊ฐ„)์˜ ์ถœ๋ ฅ ์ƒ์„ฑ์„ ์‹œ์ž‘ํ•˜๋Š” ์‹œ๊ฐ„์ž…๋‹ˆ๋‹ค.

์˜ˆ

<speak>
  <par>
    <media xml:id="question" begin="0.5s">
      <speak>Who invented the Internet?</speak>
    </media>
    <media xml:id="answer" begin="question.end+2.0s">
      <speak>The Internet was invented by cats.</speak>
    </media>
    <media begin="answer.end-0.2s" soundLevel="-6dB">
      <audio
        src="https://actions.google.com/.../cartoon_boing.ogg"/>
    </media>
    <media repeatCount="3" soundLevel="+2.28dB"
      fadeInDur="2s" fadeOutDur="0.2s">
      <audio
        src="https://actions.google.com/.../cat_purr_close.ogg"/>
    </media>
  </par>
</speak>

<seq>

๋ฏธ๋””์–ด ์š”์†Œ๋ฅผ ํ•˜๋‚˜์”ฉ ์žฌ์ƒํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” ์ˆœ์ฐจ์  ๋ฏธ๋””์–ด ์ปจํ…Œ์ด๋„ˆ์ž…๋‹ˆ๋‹ค. ์œ ์ผํ•˜๊ฒŒ ํ—ˆ์šฉ๋˜๋Š” ์ฝ˜ํ…์ธ ๋Š” <seq>, <par>, <media> ์š”์†Œ ํ•œ ๊ฐœ ์ด์ƒ์œผ๋กœ ๊ตฌ์„ฑ๋œ ์„ธํŠธ์ž…๋‹ˆ๋‹ค. ๋ฏธ๋””์–ด ์š”์†Œ์˜ ์ˆœ์„œ๋Š” ๋ Œ๋”๋ง๋˜๋Š” ์ˆœ์„œ์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

ํ•˜์œ„ ์š”์†Œ์˜ begin ๋ฐ end ์†์„ฑ์€ ์˜คํ”„์…‹ ๊ฐ’์œผ๋กœ ์„ค์ •๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์•„๋ž˜์˜ ์‹œ๊ฐ„ ์‚ฌ์–‘ ์ฐธ์กฐ). ์ด๋Ÿฌํ•œ ํ•˜์œ„ ์š”์†Œ์˜ ์˜คํ”„์…‹ ๊ฐ’์€ ์‹œํ€€์Šค์—์„œ ์ด์ „ ์š”์†Œ์˜ ๋ ์ง€์ ์„ ๊ธฐ์ค€์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์‹œํ€€์Šค์˜ ์ฒซ ๋ฒˆ์งธ ์š”์†Œ์˜ ๊ฒฝ์šฐ <seq> ์ปจํ…Œ์ด๋„ˆ์˜ ์‹œ์ž‘ ์ง€์ ์„ ๊ธฐ์ค€์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

์˜ˆ

<speak>
  <seq>
    <media begin="0.5s">
      <speak>Who invented the Internet?</speak>
    </media>
    <media begin="2.0s">
      <speak>The Internet was invented by cats.</speak>
    </media>
    <media soundLevel="-6dB">
      <audio
        src="https://actions.google.com/.../cartoon_boing.ogg"/>
    </media>
    <media repeatCount="3" soundLevel="+2.28dB"
      fadeInDur="2s" fadeOutDur="0.2s">
      <audio
        src="https://actions.google.com/.../cat_purr_close.ogg"/>
    </media>
  </seq>
</speak>

<media>

<par> ๋˜๋Š” <seq> ์š”์†Œ ๋‚ด ๋ฏธ๋””์–ด ๋ ˆ์ด์–ด๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. <media> ์š”์†Œ์—์„œ ํ—ˆ์šฉ๋˜๋Š” ์ฝ˜ํ…์ธ ๋Š” SSML <speak> ๋˜๋Š” <audio> ์š”์†Œ์ž…๋‹ˆ๋‹ค. ๋‹ค์Œ ํ‘œ์—์„œ๋Š” <media> ์š”์†Œ์˜ ์œ ํšจํ•œ ์†์„ฑ์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

์†์„ฑ

์†์„ฑ ํ•„์ˆ˜ ๊ธฐ๋ณธ๊ฐ’ ๊ฐ’
xml:id ์•„๋‹ˆ์š” ๊ฐ’ ์—†์Œ ์ด ์š”์†Œ์˜ ๊ณ ์œ ํ•œ XML ์‹๋ณ„์ž. ์ธ์ฝ”๋”ฉ๋œ ํ•ญ๋ชฉ์€ ์ง€์›๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํ—ˆ์šฉ๋˜๋Š” ์‹๋ณ„์ž ๊ฐ’์€ ์ •๊ทœ ํ‘œํ˜„์‹ "([-_#]|\p{L}|\p{D})+"์™€ ์ผ์น˜ํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ XML-ID๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.
begin ์•„๋‹ˆ์š” 0 ์ด ๋ฏธ๋””์–ด ์ปจํ…Œ์ด๋„ˆ์˜ ์‹œ์ž‘ ์‹œ๊ฐ„. ๋ฃจํŠธ ๋ฏธ๋””์–ด ์ปจํ…Œ์ด๋„ˆ ์š”์†Œ(๊ธฐ๋ณธ๊ฐ’ '0'๊ณผ ๋™์ผํ•˜๊ฒŒ ์ทจ๊ธ‰๋จ)์ด๋ฉด ๋ฌด์‹œ๋ฉ๋‹ˆ๋‹ค. ์œ ํšจํ•œ ๋ฌธ์ž์—ด ๊ฐ’์€ ์•„๋ž˜์˜ ์‹œ๊ฐ„ ์‚ฌ์–‘ ์„น์…˜์„ ์ฐธ์กฐํ•˜์„ธ์š”.
end ์•„๋‹ˆ์š” ๊ฐ’ ์—†์Œ ์ด ๋ฏธ๋””์–ด ์ปจํ…Œ์ด๋„ˆ์˜ ์ข…๋ฃŒ ์‹œ๊ฐ„์— ๋Œ€ํ•œ ์‚ฌ์–‘. ์œ ํšจํ•œ ๋ฌธ์ž์—ด ๊ฐ’์€ ์•„๋ž˜์˜ ์‹œ๊ฐ„ ์‚ฌ์–‘ ์„น์…˜์„ ์ฐธ์กฐํ•˜์„ธ์š”.
repeatCount ์•„๋‹ˆ์š” 1 ๋ฏธ๋””์–ด ์‚ฝ์ž… ํšŸ์ˆ˜๋ฅผ ์ง€์ •ํ•˜๋Š” ์‹ค์ˆ˜. ์†Œ์ˆ˜ ๋ฐ˜๋ณต์€ ์ง€์›๋˜์ง€ ์•Š์œผ๋ฏ€๋กœ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ •์ˆ˜๋กœ ๊ฐ’์ด ๋ฐ˜์˜ฌ๋ฆผ๋ฉ๋‹ˆ๋‹ค. 0์€ ์œ ํšจํ•œ ๊ฐ’์ด ์•„๋‹ˆ๋ฏ€๋กœ ์ง€์ •๋˜์ง€ ์•Š์€ ๊ฒƒ์œผ๋กœ ์ทจ๊ธ‰๋˜๋ฉฐ ์ด ๊ฒฝ์šฐ ๊ธฐ๋ณธ๊ฐ’์ด ์„ค์ •๋ฉ๋‹ˆ๋‹ค.
repeatDur ์•„๋‹ˆ์š” ๊ฐ’ ์—†์Œ ์‚ฝ์ž…๋œ ๋ฏธ๋””์–ด์˜ ์ง€์† ์‹œ๊ฐ„์„ ์ œํ•œํ•˜๋Š” TimeDesignation. ๋ฏธ๋””์–ด์˜ ์ง€์† ์‹œ๊ฐ„์ด ์ด ๊ฐ’๋ณด๋‹ค ์ž‘์œผ๋ฉด ์ง€์ •๋œ ์‹œ๊ฐ„์— ์žฌ์ƒ์ด ์ข…๋ฃŒ๋ฉ๋‹ˆ๋‹ค.
soundLevel ์•„๋‹ˆ์š” +0dB soundLevel ๋ฐ์‹œ๋ฒจ์„ ์‚ฌ์šฉํ•˜์—ฌ ์˜ค๋””์˜ค์˜ ์‚ฌ์šด๋“œ ๋ ˆ๋ฒจ์„ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ์ตœ๋Œ€ ๋ฒ”์œ„๋Š” +/-40dB์ด์ง€๋งŒ ์‹ค์ œ ๋ฒ”์œ„๋Š” ์‹ค์งˆ์ ์œผ๋กœ ๋” ์ž‘์œผ๋ฉฐ, ์ „์ฒด ๋ฒ”์œ„์—์„œ ์ถœ๋ ฅ ํ’ˆ์งˆ์ด ์ข‹์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
fadeInDur ์•„๋‹ˆ์š” 0์ดˆ ๋ฏธ๋””์–ด๊ฐ€ ๋ฌด์Œ์—์„œ ์‹œ์ž‘ํ•ด ์„ ํƒ์ ์œผ๋กœ ์ง€์ •๋œ soundLevel๋กœ ํŽ˜์ด๋“œ ์ธํ•˜๋Š” TimeDesignation. ๋ฏธ๋””์–ด์˜ ์ง€์† ์‹œ๊ฐ„์ด ์ด ๊ฐ’๋ณด๋‹ค ์ž‘์œผ๋ฉด ์žฌ์ƒ ์ข…๋ฃŒ ์‹œ ํŽ˜์ด๋“œ ์ธ์ด ์ค‘์ง€๋˜๊ณ  ์‚ฌ์šด๋“œ ๋ ˆ๋ฒจ์ด ์ง€์ •๋œ ์‚ฌ์šด๋“œ ๋ ˆ๋ฒจ์— ๋„๋‹ฌํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
fadeOutDur ์•„๋‹ˆ์š” 0์ดˆ ๋ฏธ๋””์–ด๊ฐ€ ์„ ํƒ์ ์œผ๋กœ ์ง€์ •๋œ soundLevel์—์„œ ์‹œ์ž‘ํ•ด ๋ฌด์Œ์ด ๋  ๋•Œ๊นŒ์ง€ ํŽ˜์ด๋“œ ์•„์›ƒํ•˜๋Š” TimeDesignation. ๋ฏธ๋””์–ด์˜ ์ง€์† ์‹œ๊ฐ„์ด ์ด ๊ฐ’๋ณด๋‹ค ์ž‘์œผ๋ฉด ์žฌ์ƒ ์ข…๋ฃŒ ์‹œ ๋ฌด์Œ์— ๋„๋‹ฌํ•  ์ˆ˜ ์žˆ๋„๋ก ์‚ฌ์šด๋“œ ๋ ˆ๋ฒจ์ด ๋” ๋‚ฎ์€ ๊ฐ’์œผ๋กœ ์„ค์ •๋ฉ๋‹ˆ๋‹ค.

์‹œ๊ฐ„ ์‚ฌ์–‘

<media> ์š”์†Œ์™€ ๋ฏธ๋””์–ด ์ปจํ…Œ์ด๋„ˆ(<par> ๋ฐ <seq> ์š”์†Œ)์˜ `begin`๊ณผ `end` ์†์„ฑ ๊ฐ’์— ์‚ฌ์šฉ๋˜๋Š” ์‹œ๊ฐ„ ์‚ฌ์–‘์€ ์˜คํ”„์…‹ ๊ฐ’(์˜ˆ: +2.5s) ๋˜๋Š” syncbase ๊ฐ’(์˜ˆ: foo_id.end-250ms)์ž…๋‹ˆ๋‹ค.

  • ์˜คํ”„์…‹ ๊ฐ’ - ์‹œ๊ฐ„ ์˜คํ”„์…‹ ๊ฐ’์€ ์ •๊ทœ ํ‘œํ˜„์‹ "\s\*(+|-)?\s\*(\d+)(\.\d+)?(h|min|s|ms)?\s\*"์™€ ์ผ์น˜ํ•˜๋Š” ๊ฐ’์„ ํ—ˆ์šฉํ•˜๋Š” SMIL Timecount ๊ฐ’์ž…๋‹ˆ๋‹ค.

    ์ฒซ ๋ฒˆ์งธ ์ˆซ์ž ๋ฌธ์ž์—ด์€ ์‹ญ์ง„์ˆ˜์˜ ์ „์ฒด ๋ถ€๋ถ„์ด๊ณ  ๋‘ ๋ฒˆ์งธ ์ˆซ์ž ๋ฌธ์ž์—ด์€ ์‹ญ์ง„์ˆ˜์˜ ์†Œ์ˆ˜ ๋ถ€๋ถ„์ž…๋‹ˆ๋‹ค. ๊ธฐ๋ณธ ๊ธฐํ˜ธ(์˜ˆ: '(+|-)?')๋Š” '+'์ž…๋‹ˆ๋‹ค. ๋‹จ์œ„ ๊ฐ’์€ ๊ฐ๊ฐ ์‹œ, ๋ถ„, ์ดˆ, ๋ฐ€๋ฆฌ์ดˆ์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค. ๋‹จ์œ„์˜ ๊ธฐ๋ณธ๊ฐ’์€ 's'(์ดˆ)์ž…๋‹ˆ๋‹ค.

  • Syncbase ๊ฐ’ - syncbase ๊ฐ’์€ ์ •๊ทœ ํ‘œํ˜„์‹ "([-_#]|\p{L}|\p{D})+\.(begin|end)\s\*(+|-)\s\*(\d+)(\.\d+)?(h|min|s|ms)?\s\*"์™€ ์ผ์น˜ํ•˜๋Š” ๊ฐ’์„ ํ—ˆ์šฉํ•˜๋Š” SMIL syncbase ๊ฐ’์ž…๋‹ˆ๋‹ค.

    ์ˆซ์ž์™€ ๋‹จ์œ„๋Š” ์˜คํ”„์…‹ ๊ฐ’๊ณผ ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ ํ•ด์„๋ฉ๋‹ˆ๋‹ค.

<phoneme>

<phoneme> ํƒœ๊ทธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹จ์–ด์˜ ์ปค์Šคํ…€ ๋ฐœ์Œ์„ ์ธ๋ผ์ธ์œผ๋กœ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Text-to-Speech์—๋Š” IPA ๋ฐ X-SAMPA ์Œ์„ฑ ๊ธฐํ˜ธ๊ฐ€ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ง€์›๋˜๋Š” ์–ธ์–ด ๋ฐ ์Œ์†Œ ๋ชฉ๋ก์€ ์Œ์†Œ ํŽ˜์ด์ง€๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

<phoneme> ํƒœ๊ทธ๋ฅผ ํ•œ ๋ฒˆ์”ฉ ์ ์šฉํ•˜์—ฌ ๋‹จ์ผ ๋‹จ์–ด์˜ ์Œ์„ฑ์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.

  <phoneme alphabet="ipa" ph="หŒmรฆnษชหˆtoสŠbษ™">manitoba</phoneme>
  <phoneme alphabet="x-sampa" ph='m@"hA:g@%ni:'>mahogany</phoneme>

๊ฐ•์„ธ ํ‘œ์‹œ

์Šคํฌ๋ฆฝํŠธ์—๋Š” ์ตœ๋Œ€ 3๊ฐœ์˜ ๊ฐ•์„ธ๋ฅผ ํ‘œ์‹œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  1. ์ฃผ ๊ฐ•์„ธ: IPA์˜ ๊ฒฝ์šฐ /หˆ/, X-SAMPA์˜ ๊ฒฝ์šฐ์—๋Š” /"/๋กœ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.
  2. ๋ถ€ ๊ฐ•์„ธ: IPA์˜ ๊ฒฝ์šฐ /หŒ/, X-SAMPA์˜ ๊ฒฝ์šฐ์—๋Š” /%/๋กœ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.
  3. ๊ฐ•์„ธ ์—†์Œ: ๊ธฐํ˜ธ ๋˜๋Š” ํŠน์ • ํ‘œ๊ธฐ๋ฒ•์œผ๋กœ ํ‘œ์‹œ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์ผ๋ถ€ ์–ธ์–ด๋Š” ๊ฐ•์„ธ๊ฐ€ 3๊ฐœ ๋ฏธ๋งŒ์ด๊ฑฐ๋‚˜ ๊ฐ•์„ธ ํ‘œ์‹œ๊ฐ€ ์ „ํ˜€ ์‚ฌ์šฉ๋˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•ด๋‹น ์–ธ์–ด์— ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๊ฐ•์„ธ๋ฅผ ๋ณด๋ ค๋ฉด ์Œ์†Œ ํŽ˜์ด์ง€๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”. ๊ฐ•์„ธ ํ‘œ์‹œ๋Š” ๊ฐ•์„ธ๊ฐ€ ์žˆ๋Š” ๊ฐ ์Œ์ ˆ์˜ ์‹œ์ž‘ ๋ถ€๋ถ„์— ๋ฐฐ์น˜๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋ฏธ๊ตญ ์˜์–ด์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์˜ˆ์‹œ ๋‹จ์–ด IPA X-SAMPA
water หˆwษ‘หtษš "wA:t@`
underwater หŒสŒndษšหˆwษ‘หtษš %Vnd@"wA:t@

ํฌ๊ด„์  ํ‘œ๊ธฐ์™€ ํ˜‘์†Œ์  ํ‘œ๊ธฐ

์ผ๋ฐ˜์ ์œผ๋กœ ํ‘œ๊ธฐ๋Š” ํฌ๊ด„์ ์ด๊ณ  ์ž์—ฐ์ ์ธ ์Œ์„ฑ์œผ๋กœ ํ‘œํ˜„ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋ฏธ๊ตญ ์˜์–ด์—์„œ๋Š” ํƒ„์„ค์Œ์„ ์‚ฌ์šฉํ•˜๋Š” ๋Œ€์‹  ๋ชจ์Œ ์‚ฌ์ด์— ์˜ค๋Š” /t/๋ฅผ ํ‘œ๊ธฐํ•ฉ๋‹ˆ๋‹ค.

์˜ˆ์‹œ ๋‹จ์–ด IPA X-SAMPA
butter หˆbสŒษพษš ๋Œ€์‹  หˆbสŒtษš "bV4@` ๋Œ€์‹  "bVt@`

์ผ๋ถ€ ๊ฒฝ์šฐ์—๋Š” ์Œ์„ฑ ํ‘œํ˜„์„ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ TTS ๊ฒฐ๊ณผ๊ฐ€ ๋ถ€์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋“ค๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์˜ˆ: ์ผ๋ จ์˜ ์Œ์†Œ๊ฐ€ ํ•ด๋ถ€ํ•™์ ์œผ๋กœ ๋ฐœ์Œํ•˜๊ธฐ ์–ด๋ ค์šด ๊ฒฝ์šฐ).

์ด์— ๋Œ€ํ•œ ํ•œ ๊ฐ€์ง€ ์˜ˆ๋กœ ์˜์–ด์—์„œ /s/์˜ ๋™ํ™”ํ˜„์ƒ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒฝ์šฐ์—๋Š” ๊ทธ๋Ÿฌํ•œ ๋™ํ™”ํ˜„์ƒ์„ ํ‘œ๊ธฐ์— ๋ฐ˜์˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์˜ˆ์‹œ ๋‹จ์–ด IPA X-SAMPA
๊ณ ์–‘์ด หˆkรฆts "k{ts
๊ฐœ หˆdษ‘หgs ๋Œ€์‹  หˆdษ‘หgz "dA:gs ๋Œ€์‹  "dA:gz

์ ˆ๊ฐ

๋ชจ๋“  ์Œ์ ˆ์€ ํ•˜๋‚˜(๊ทธ๋ฆฌ๊ณ  ๋‹จ ํ•˜๋‚˜)์˜ ๋ชจ์Œ์„ ํฌํ•จํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰ ์Œ์ ˆ ์ž์Œ ์‚ฌ์šฉ์„ ํ”ผํ•˜๊ณ  ๋Œ€์‹  ์ด๋ฅผ ์•ฝํ™”๋œ ๋ชจ์Œ์œผ๋กœ ํ‘œ๊ธฐํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์˜ˆ์‹œ ๋‹จ์–ด IPA X-SAMPA
kitten หˆkษชtn ๋Œ€์‹  หˆkษชtษ™n "kitn ๋Œ€์‹  "kIt@n
kettle หˆkษ›tl ๋Œ€์‹  หˆkษ›tษ™l "kEtl ๋Œ€์‹  "kEt@l

์Œ์ ˆ ๊ตฌ๋ถ„

์„ ํƒ์ ์œผ๋กœ /./๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์Œ์ ˆ ๊ฒฝ๊ณ„๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ ์Œ์ ˆ์€ ํ•˜๋‚˜(๊ทธ๋ฆฌ๊ณ  ๋‹จ ํ•˜๋‚˜)์˜ ๋ชจ์Œ์„ ํฌํ•จํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์˜ˆ์‹œ ๋‹จ์–ด IPA X-SAMPA
๊ฐ€๋…์„ฑ หŒษนiห.dษ™.หˆbษช.lษ™.tiห %r\i:.d@."bI.l@.ti:

์ปค์Šคํ…€ ๋ฐœ์Œ ์‚ฌ์ „

phoneme ํƒœ๊ทธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐœ์Œ์„ ์ธ๋ผ์ธ์œผ๋กœ ์ œ๊ณตํ•˜๋Š” ๋Œ€์‹  ์Œ์„ฑ ํ•ฉ์„ฑ RPC์— ์ปค์Šคํ…€ ๋ฐœ์Œ ์‚ฌ์ „์„ ์ œ๊ณตํ•˜์„ธ์š”. ์ปค์Šคํ…€ ๋ฐœ์Œ ์‚ฌ์ „์ด ์š”์ฒญ์— ํฌํ•จ๋œ ๊ฒฝ์šฐ SSML phoneme ํƒœ๊ทธ์— ๋”ฐ๋ผ ์ž…๋ ฅ ํ…์ŠคํŠธ๊ฐ€ ์ž๋™์œผ๋กœ ์กฐ์ •๋ฉ๋‹ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด ํ…์ŠคํŠธ ์ž…๋ ฅ๊ณผ ์ปค์Šคํ…€ ๋ฐœ์Œ์ด ํฌํ•จ๋œ ๋‹ค์Œ ์š”์ฒญ์€ ์•„๋ž˜ ํ‘œ์‹œ๋œ SSML์— ๋งž๊ฒŒ ์ž๋™์œผ๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค.

์›๋ž˜ ์ž…๋ ฅ:

input: {
  text: 'Hello world! It is indeed a beautiful world!',
  custom_pronunciations: {
    pronunciations: {
      phrase: 'world'
      phonetic_encoding: PHONETIC_ENCODING_IPA
      pronunciation: 'wษœหld'
    }
  }
}

๋ณ€ํ™˜๋œ ์ž…๋ ฅ:

input: {
  ssml: '<speak>Hello <phoneme alphabet="ipa" ph="wษœหld">world</phoneme>! It is indeed a beautiful <phoneme alphabet="ipa" ph="wษœหld">world</phoneme>!</speak>'
}

๊ธฐ๊ฐ„

Text-to-Speech๋Š” <say-as interpret-as="duration">์„ ์ง€์›ํ•˜์—ฌ ๊ธฐ๊ฐ„์„ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ธ์‹ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋‹ค์Œ ์˜ˆ์‹œ๋Š” "five hours and thirty minutes"๋กœ ์Œ์„ฑ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค.

<say-as interpret-as="duration" format="h:m">5:30</say-as>

ํ˜•์‹ ๋ฌธ์ž์—ด์—๋Š” ๋‹ค์Œ ๊ฐ’์ด ์ง€์›๋ฉ๋‹ˆ๋‹ค.

์•ฝ์–ด ๊ฐ’
h ์‹œ๊ฐ„
m ๋ถ„
s ์ดˆ
ms ๋ฐ€๋ฆฌ์ดˆ

<voice>

<voice> ํƒœ๊ทธ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋‹จ์ผ SSML ์š”์ฒญ์— ์Œ์„ฑ์„ ๋‘ ๊ฐ€์ง€ ์ด์ƒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ ์˜ˆ์‹œ์—์„œ ๊ธฐ๋ณธ ์Œ์„ฑ์€ ๋‚จ์ž ์˜์–ด ์Œ์„ฑ์ž…๋‹ˆ๋‹ค. ์ด ์Œ์„ฑ์—์„œ๋Š” ์—ฌ์ž ์Œ์„ฑ์œผ๋กœ ํ”„๋ž‘์Šค์–ด๋กœ ๋ฐœ์Œ๋˜๋Š” "qu'est-ce qui t'amรจne ici"๋ฅผ ์ œ์™ธํ•˜๊ณ  ๋ชจ๋“  ๋‹จ์–ด๊ฐ€ ๊ธฐ๋ณธ ์–ธ์–ด(์˜์–ด) ๋ฐ ์„ฑ๋ณ„(๋‚จ์„ฑ)๋กœ ํ•ฉ์„ฑ๋ฉ๋‹ˆ๋‹ค.

<speak>And then she asked, <voice language="fr-FR" gender="female">qu'est-ce qui
t'amรจne ici</voice><break time="250ms"/> in her sweet and gentle voice.</speak>

๋˜๋Š” language ๋˜๋Š” gender๋ฅผ ์ง€์ •ํ•˜๋Š” ๋Œ€์‹  <voice> ํƒœ๊ทธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐœ๋ณ„ ์Œ์„ฑ(์ง€์›๋˜๋Š” ์Œ์„ฑ ํŽ˜์ด์ง€์˜ ์Œ์„ฑ ์ด๋ฆ„)์„ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

<speak>The dog is friendly<voice name="fr-CA-Wavenet-B">mais la chat est
mignon</voice><break time="250ms"/> said a pet shop
owner</speak>

<voice> ํƒœ๊ทธ๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ Text-to-Speech๋Š” name(์‚ฌ์šฉํ•˜๋ ค๋Š” ์Œ์„ฑ ์ด๋ฆ„) ๋˜๋Š” ๋‹ค์Œ ์†์„ฑ์˜ ์กฐํ•ฉ์ด ์ž…๋ ฅ๋˜๊ธฐ๋ฅผ ์˜ˆ์ƒํ•ฉ๋‹ˆ๋‹ค. ์„ธ ๊ฐ€์ง€ ์†์„ฑ ๋ชจ๋‘ ์„ ํƒ์‚ฌํ•ญ์ด์ง€๋งŒ name์„ ์ œ๊ณตํ•˜์ง€ ์•Š์„ ๊ฒฝ์šฐ ์ตœ์†Œํ•œ ์…‹ ์ค‘ ํ•˜๋‚˜๋ฅผ ์ œ๊ณตํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

  • gender: "male", "female", "neutral" ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค.
  • variant: ๊ตฌ์„ฑ์— ๋”ฐ๋ผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์Œ์„ฑ์ด ์—ฌ๋Ÿฌ ๊ฐœ์ธ ๊ฒฝ์šฐ ๊ฒฐ์ •์ž๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • language: ์›ํ•˜๋Š” ์–ธ์–ด์ž…๋‹ˆ๋‹ค. ์ œ๊ณต๋œ <voice> ํƒœ๊ทธ์— ์–ธ์–ด๋ฅผ ํ•˜๋‚˜๋งŒ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์–ธ์–ด๋Š” BCP-47 ํ˜•์‹์œผ๋กœ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. ์ง€์›๋˜๋Š” ์Œ์„ฑ ๋ฐ ์–ธ์–ด ํŽ˜์ด์ง€์˜ ์–ธ์–ด ์ฝ”๋“œ ์—ด์—์„œ ํ•ด๋‹น ์–ธ์–ด์˜ BCP-47 ์ฝ”๋“œ๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋˜ํ•œ ๋‘ ๊ฐ€์ง€ ์ถ”๊ฐ€ ํƒœ๊ทธ์ธ required ๋ฐ ordering์„ ์‚ฌ์šฉํ•˜์—ฌ gender, variant, language ์†์„ฑ์˜ ์ƒ๋Œ€์  ์šฐ์„ ์ˆœ์œ„๋ฅผ ์ œ์–ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • required: ์†์„ฑ์ด required๋กœ ์ง€์ •๋˜์—ˆ๊ณ  ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ๊ตฌ์„ฑ๋˜์ง€ ์•Š์•˜์œผ๋ฉด ์š”์ฒญ์ด ์‹คํŒจํ•ฉ๋‹ˆ๋‹ค.
  • ordering: ordering ํƒœ๊ทธ๊ฐ€ ํ•„์ˆ˜๊ฐ€ ์•„๋‹Œ ์„ ํ˜ธ ์†์„ฑ์œผ๋กœ ๊ฐ„์ฃผ๋œ ๋‹ค์Œ์— ๋‚˜์—ด๋˜๋Š” ๋ชจ๋“  ์†์„ฑ์ž…๋‹ˆ๋‹ค. ์„ ํ˜ธ ์†์„ฑ์— ๋Œ€ํ•ด Text-to-Speech API๋Š” ordering ํƒœ๊ทธ ๋‹ค์Œ์— ๋‚˜์—ด๋œ ์ˆœ์„œ๋กœ ๊ฐ€๋Šฅํ•œ ๋ถ€๋ถ„๊นŒ์ง€ ์ตœ๋Œ€ํ•œ ์„ ํ˜ธ ์†์„ฑ์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์„ ํ˜ธ ์†์„ฑ์ด ์ž˜๋ชป ๊ตฌ์„ฑ๋˜์—ˆ์œผ๋ฉด Text-to-Speech๊ฐ€ ์ž˜๋ชป๋œ ์Œ์„ฑ์„ ๋ฐ˜ํ™˜ํ•  ์ˆ˜ ์žˆ๋”๋ผ๋„, ์ž˜๋ชป๋œ ๊ตฌ์„ฑ์ด ์‚ญ์ œ๋ฉ๋‹ˆ๋‹ค.

required ๋ฐ ordering ํƒœ๊ทธ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ตฌ์„ฑ ์˜ˆ์‹œ:

<speak>And there it was <voice language="en-GB" gender="male" required="gender"
ordering="gender language">a flying bird </voice>roaring in the skies for the
first time.</speak>
<speak>Today is supposed to be <voice language="en-GB" gender="female"
ordering="language gender">Sunday Funday.</voice></speak>

<lang>

<lang>์„ ์‚ฌ์šฉํ•˜๋ฉด ๋™์ผํ•œ SSML ์š”์ฒญ ๋‚ด์—์„œ ์—ฌ๋Ÿฌ ์–ธ์–ด๋กœ ํ…์ŠคํŠธ๋ฅผ ํฌํ•จํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. <voice> ํƒœ๊ทธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์Œ์„ฑ์„ ๋ช…์‹œ์ ์œผ๋กœ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๋Š” ํ•œ ๋ชจ๋“  ์–ธ์–ด๊ฐ€ ๋™์ผํ•œ ์Œ์„ฑ์„ ํ•ฉ์„ฑ๋ฉ๋‹ˆ๋‹ค. xml:lang ๋ฌธ์ž์—ด์—๋Š” ๋„์ฐฉ์–ด๊ฐ€ BCP-47 ํ˜•์‹(์ด ๊ฐ’์€ ์ง€์›๋˜๋Š” ์Œ์„ฑ ํ…Œ์ด๋ธ”์—์„œ "์–ธ์–ด ์ฝ”๋“œ"๋กœ ๋‚˜์—ด๋จ)์œผ๋กœ ํฌํ•จ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ ์˜ˆ์‹œ์—์„œ "chat"๋Š” ๊ธฐ๋ณธ ์–ธ์–ด(์˜์–ด) ๋Œ€์‹  ํ”„๋ž‘์Šค์–ด๋กœ ์Œ์„ฑ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค.

<speak>The french word for cat is <lang xml:lang="fr-FR">chat</lang></speak>

Text-to-Speech๋Š” ๊ฐ€๋Šฅํ•œ ๋ถ€๋ถ„๊นŒ์ง€ ์ตœ๋Œ€ํ•œ์œผ๋กœ <lang> ํƒœ๊ทธ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ๋™์ผํ•œ SSML ์š”์ฒญ์— ์ง€์ •๋˜์—ˆ๋”๋ผ๋„ ๋ชจ๋“  ์–ธ์–ด ์กฐํ•ฉ์ด ๋™์ผํ•œ ํ’ˆ์งˆ ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•˜์ง€๋Š” ์•Š์Šต๋‹ˆ๋‹ค. ์ผ๋ถ€ ๊ฒฝ์šฐ์—๋Š” ์–ธ์–ด ์กฐํ•ฉ์œผ๋กœ ์ธํ•ด ๊ฐ์ง€ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ๋ฏธ๋ฌ˜ํ•˜๊ฑฐ๋‚˜ ๋ถ€์ •์ ์œผ๋กœ ์ธ์‹๋˜๋Š” ํšจ๊ณผ๋ฅผ ๋ฐœ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์•Œ๋ ค์ง„ ๋ฌธ์ œ:

  • ๊ฐ„์ง€ ๋ฌธ์ž๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ผ๋ณธ์–ด๋Š” <lang> ํƒœ๊ทธ๊ฐ€ ์ง€์›๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ž…๋ ฅ์€ ํ•œ์ž๋กœ ์Œ์—ญ๋˜๊ณ  ์ฝํ˜€์ง‘๋‹ˆ๋‹ค.
  • ์•„๋ž์–ด, ํžˆ๋ธŒ๋ฆฌ์–ด, ํŽ˜๋ฅด์‹œ์•„์–ด์™€ ๊ฐ™์€ ์…ˆ์กฑ ์–ด๊ตฐ์˜ ์–ธ์–ด๋Š” <lang> ํƒœ๊ทธ๊ฐ€ ์ง€์›๋˜์ง€ ์•Š์œผ๋ฉฐ ๋ฌด์Œ ์ฒ˜๋ฆฌ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์–ธ์–ด๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ์œผ๋ฉด <voice> ํƒœ๊ทธ๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์›ํ•˜๋Š” ์–ธ์–ด๋กœ ๋ฐœ์Œ๋˜๋Š”(๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ) ์Œ์„ฑ์œผ๋กœ ์ „ํ™˜ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

SSML ํƒ€์ž„ํฌ์ธํŠธ

Text-to-Speech API๋Š” ์ƒ์„ฑ๋œ ์˜ค๋””์˜ค ๋ฐ์ดํ„ฐ์—์„œ ํƒ€์ž„ํฌ์ธํŠธ ์‚ฌ์šฉ์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ํƒ€์ž„ํฌ์ธํŠธ๋Š” ์Šคํฌ๋ฆฝํŠธ์˜ ์ง€์ •๋œ ์ง€์ ์— ํ•ด๋‹นํ•˜๋Š” ํƒ€์ž„์Šคํƒฌํ”„์ž…๋‹ˆ๋‹ค. ์ƒ์„ฑ๋œ ์˜ค๋””์˜ค์˜ ์‹œ์ž‘ ๋ถ€๋ถ„๋ถ€ํ„ฐ ์ดˆ ๋‹จ์œ„๋กœ ์ธก์ •๋ฉ๋‹ˆ๋‹ค. <mark> ํƒœ๊ทธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์Šคํฌ๋ฆฝํŠธ์— ํƒ€์ž„ํฌ์ธํŠธ๋ฅผ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ค๋””์˜ค๊ฐ€ ์ƒ์„ฑ๋˜๋ฉด API๋Š” ์˜ค๋””์˜ค ์‹œ์ž‘ ๋ถ€๋ถ„๊ณผ ํƒ€์ž„ํฌ์ธํŠธ ๊ฐ„์˜ ํƒ€์ž„์Šคํƒฌํ”„๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

ํƒ€์ž„ํฌ์ธํŠธ๋ฅผ ์„ค์ •ํ•˜๋ ค๋ฉด ๋‘ ๋‹จ๊ณ„๋ฅผ ์ˆ˜ํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

  1. ์Šคํฌ๋ฆฝํŠธ์—์„œ ํƒ€์ž„์Šคํƒฌํ”„๋ฅผ ์„ค์ •ํ•˜๋ ค๋Š” ์‹œ์ ์— <mark> SSML ํƒœ๊ทธ๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
  2. TimepointType์„ SSML_MARK๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด ํ•„๋“œ๊ฐ€ ์„ค์ •๋˜์ง€ ์•Š์œผ๋ฉด ๊ธฐ๋ณธ์ ์œผ๋กœ ํƒ€์ž„ํฌ์ธํŠธ๊ฐ€ ๋ฐ˜ํ™˜๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

๋‹ค์Œ ์˜ˆ์‹œ๋Š” ๋‘ ๊ฐ€์ง€ ํƒ€์ž„ํฌ์ธํŠธ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

  • timepoint_1: "Mark" ๋‹จ์–ด๊ฐ€ ์ƒ์„ฑ๋œ ์˜ค๋””์˜ค์— ๋‚˜ํƒ€๋‚˜๋Š” ์‹œ๊ฐ„(์ดˆ)์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
  • timepoint_2: "see" ๋‹จ์–ด๊ฐ€ ์ƒ์„ฑ๋œ ์˜ค๋””์˜ค์— ๋‚˜ํƒ€๋‚˜๋Š” ์‹œ๊ฐ„(์ดˆ)์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
<speak>Hello <mark name="timepoint_1"/> Mark. Good to <mark
name="timepoint_2"/> see you.</speak>

์Šคํƒ€์ผ

๋‹ค์Œ ์Œ์„ฑ์€ ์—ฌ๋Ÿฌ ์Šคํƒ€์ผ๋กœ ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  1. en-US-Neural2-F
  2. en-US-Neural2-J

<google:style> ํƒœ๊ทธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์šฉํ•  ์Šคํƒ€์ผ์„ ์ œ์–ดํ•ฉ๋‹ˆ๋‹ค. ์ „์ฒด ๋ฌธ์žฅ์—๋งŒ ํƒœ๊ทธ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

์˜ˆ:

<speak><google:style name="lively">Hello I'm so happy today!</google:style></speak>

name ํ•„๋“œ๋Š” ๋‹ค์Œ ๊ฐ’์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

  1. apologetic
  2. calm
  3. empathetic
  4. firm
  5. lively