動作識別

動作辨識功能可從影片片段中識別不同動作,例如走路或跳舞。這些動作可能會在影片播放期間執行,也可能不會。

使用 AutoML 模型

事前準備

如要瞭解如何建立 AutoML 模型,請參閱 Vertex AI 新手指南。如需如何建立 AutoML 模型的說明,請參閱 Vertex AI 說明文件「開發及使用機器學習模型」中的「影片資料」。

使用您的 AutoML 模型

下列程式碼範例說明如何使用串流用戶端程式庫,透過 AutoML 模型進行動作辨識。

Python

如要向 Video Intelligence 進行驗證,請設定應用程式預設憑證。 詳情請參閱「為本機開發環境設定驗證」。

import io

from google.cloud import videointelligence_v1p3beta1 as videointelligence

# path = 'path_to_file'
# project_id = 'project_id'
# model_id = 'automl_action_recognition_model_id'

client = videointelligence.StreamingVideoIntelligenceServiceClient()

model_path = "projects/{}/locations/us-central1/models/{}".format(
    project_id, model_id
)

automl_config = videointelligence.StreamingAutomlActionRecognitionConfig(
    model_name=model_path
)

video_config = videointelligence.StreamingVideoConfig(
    feature=videointelligence.StreamingFeature.STREAMING_AUTOML_ACTION_RECOGNITION,
    automl_action_recognition_config=automl_config,
)

# config_request should be the first in the stream of requests.
config_request = videointelligence.StreamingAnnotateVideoRequest(
    video_config=video_config
)

# Set the chunk size to 5MB (recommended less than 10MB).
chunk_size = 5 * 1024 * 1024

def stream_generator():
    yield config_request
    # Load file content.
    # Note: Input videos must have supported video codecs. See
    # https://cloud.google.com/video-intelligence/docs/streaming/streaming#supported_video_codecs
    # for more details.
    with io.open(path, "rb") as video_file:
        while True:
            data = video_file.read(chunk_size)
            if not data:
                break
            yield videointelligence.StreamingAnnotateVideoRequest(
                input_content=data
            )

requests = stream_generator()

# streaming_annotate_video returns a generator.
# The default timeout is about 300 seconds.
# To process longer videos it should be set to
# larger than the length (in seconds) of the video.
responses = client.streaming_annotate_video(requests, timeout=900)

# Each response corresponds to about 1 second of video.
for response in responses:
    # Check for errors.
    if response.error.message:
        print(response.error.message)
        break

    for label in response.annotation_results.label_annotations:
        for frame in label.frames:
            print(
                "At {:3d}s segment, {:5.1%} {}".format(
                    frame.time_offset.seconds,
                    frame.confidence,
                    label.entity.entity_id,
                )
            )