このドキュメントでは、音声データを Speech-to-Text API へ渡す際の推奨事項を説明します。ここで説明するガイドラインは、効率と精度を向上させることと、サービスのレスポンス時間を最適にすることを目的としています。Speech-to-Text API を使用して最良の効果が得られるのは、サービスに送信されるデータがこのドキュメントで説明するパラメータの範囲内にあるときです。
[[["わかりやすい","easyToUnderstand","thumb-up"],["問題の解決に役立った","solvedMyProblem","thumb-up"],["その他","otherUp","thumb-up"]],[["わかりにくい","hardToUnderstand","thumb-down"],["情報またはサンプルコードが不正確","incorrectInformationOrSampleCode","thumb-down"],["必要な情報 / サンプルがない","missingTheInformationSamplesINeed","thumb-down"],["翻訳に関する問題","translationIssue","thumb-down"],["その他","otherDown","thumb-down"]],["最終更新日 2025-09-04 UTC。"],[],[],null,["# Best practices to provide data to the Speech-to-Text API\n\n\u003cbr /\u003e\n\nThis document contains recommendations on how to provide speech data to the\nSpeech-to-Text API. These guidelines are designed for greater efficiency\nand accuracy as well as reasonable response times from the service. Use of the\nSpeech-to-Text API works best when data sent to the service is within the parameters\ndescribed in this document.\n\nIf you follow these guidelines and don't get the results you expect from the\nAPI, see [Troubleshooting \\& Support](/speech-to-text/docs/support).\n\nSampling rate\n-------------\n\nIf possible, set the sampling rate of the audio source to 16000 Hz. Otherwise,\nset the\n[sample_rate_hertz](/speech-to-text/docs/reference/rpc/google.cloud.speech.v1#recognitionconfig)\nto match the native sample rate of the audio source (instead of re-sampling).\n\nFrame size\n----------\n\nStreaming recognition recognizes live audio as it is captured from a microphone\nor other audio source. The audio stream is split into frames and sent in\nconsecutive `StreamingRecognizeRequest` messages. Any frame size is acceptable.\nLarger frames are more efficient, but add latency. A 100-millisecond frame size\nis recommended as a good tradeoff between latency and efficiency.\n\nAudio pre-processing\n--------------------\n\nIt's best to provide audio that is as clean as possible by using a good quality\nand well-positioned microphone. However, applying noise-reduction signal\nprocessing to the audio before sending it to the service typically reduces\nrecognition accuracy. The service is designed to handle noisy audio.\n\nFor best results:\n\n- Position the microphone as close as possible to the person that is speaking, particularly when background noise is present.\n- Avoid audio clipping.\n- Do not use automatic gain control (AGC).\n- All noise reduction processing should be disabled.\n- Listen to some sample audio. It should sound clear, without distortion or unexpected noise.\n\nRequest configuration\n---------------------\n\nMake sure that you accurately describe the audio data sent with your request\nto the Speech-to-Text API. Ensuring that the [RecognitionConfig](/speech-to-text/docs/reference/rpc/google.cloud.speech.v1#recognitionconfig) for your\nrequest describes the correct `sampleRateHertz`, `encoding`, and `languageCode`\nwill result in the most accurate transcription and billing for your request."]]