#### Copyright (C) 2025 The Qt Company Ltd. ##### SPDX-License-Identifier: LicenseRef-Qt-Commercial OR LGPL-3.0-only OR GPL-2.0-only OR GPL-3.0-only # Qt CI Analysis Bot ## What is this? This application is an AI-powered tool designed to analyze Continuous Integration (CI) failures for Qt projects hosted on Gerrit. It leverages Azure OpenAI's GPT-4 model to provide insights into why a CI run might have failed in relation to the code changes submitted. ## Goal The primary goal is for GPT-4 to determine the relevance of a test failure (or other CI errors) against the changes made to the source code in a given Gerrit change. If the change is deemed to have caused the failure, GPT-4 is further tasked with suggesting a possible failure mode and, where applicable, pinpointing relevant code sections. ## How it Works The bot operates through the following general workflow: 1. **Trigger**: * It can be run in standalone mode for a specific change. * It can run as a webserver, listening for webhook events from Gerrit. Supported events include `change-integration-fail` (Custom event for Qt's gerrit) and specific `comment-added` events (e.g., "Quick Check: Failed" by `qt_ci_bot`). 2. **Data Collection**: * **Integration ID & Tested Changes**: For a given Gerrit change, it first identifies the relevant COIN (CI system) integration ID and the list of changes tested in that integration run. * **Failure Logs**: It retrieves the failure logs from COIN for the identified integration ID. * **Log Analysis (Initial)**: The logs are processed to extract error snippets. This involves: * `logTools.js`: Parsing test failures (e.g., `FAIL! : tst_MyClass::myFunction()`) or analyzing log chunks for general errors using AI (`aiAnalyzeLogChunk`). * `aiTools.js`: AI-assisted snipping (`aiSnipTestFailure`) to refine error context. * **Gerrit Data**: * `gerritTools.js`: Fetches the complete diff of the primary change from Gerrit. * Optionally, it attempts to fetch the source code of the failed test(s) if identified. * **Flaky Test Check**: * `dbTools.js`: Queries a PostgreSQL database to determine if the failed test(s) are known to be flaky. 3. **AI Analysis (GPT-4)**: * `aiTools.js` (`aiAnalyzeCollectedData`): All collated data (log snippets, error summaries, change diff, test source code snippets, flaky test info, platform identifier, list of tested changes) is assembled into a prompt. * This prompt is sent to Azure OpenAI (GPT-4) for analysis. The AI is asked to determine if the primary code change caused the failure and to provide a rationale. 4. **Reporting**: * **Gerrit Comment**: The analysis result from GPT-4 is posted as a comment on the Gerrit change. * **Failure Classification**: * `aiTools.js` (`aiGuessFailClassification`): The AI's output is further processed to classify the failure type (e.g., infrastructure, real build failure, flaky test). * `dbTools.js`: This classification is written to an InfluxDB database for metrics and tracking. 5. **Queueing**: * When running as a webserver, incoming requests for analysis are queued per integration ID to process them sequentially and avoid overwhelming downstream services or rate limits. ## Running the Application ### Prerequisites * Node.js and npm. * Run `npm install` to install dependencies listed in `package.json`. ### Configuration The application requires configuration for various services: * Gerrit (URL, credentials) * Azure OpenAI (Client ID, Tenant ID, Client Secret, Deployment, API Version) * PostgreSQL (for flaky test database) * InfluxDB (for failure classification metrics) Configuration is primarily managed through a `config.json` file. A template `config.json.template` is provided. Copy this template to `config.json` and fill in the necessary credentials and endpoints. Values in `config.json` can be overridden by environment variables (e.g., `GERRIT_URL` environment variable will take precedence over `GERRIT_URL` in `config.json`). ### Standalone Mode To analyze a single change without running the webserver, execute: ```bash node main.js ``` Example: ```bash node main.js qt/qtbase~dev~I123456789abcdef0123456789abcdef01234567 ``` The analysis result will be printed to the console, and if not disabled, a comment will be posted to Gerrit. ### Webserver Mode To run the application as a webserver that listens for Gerrit webhooks: ```bash node main.js ``` The server will start and listen on the port defined by `WEBHOOK_PORT` in `config.json`. ### Environment Variables * `LOG_LEVEL`: Sets the logging level (e.g., `info`, `debug`). Defaults to `info`. * `NO_POST`: If set to `1`, the bot will not post comments to Gerrit. This is useful for testing. ```bash NO_POST=1 node main.js qt/qtbase~dev~I123456789abcdef0123456789abcdef01234567 ``` * Service-specific credentials (e.g., `GERRIT_USER`, `AZURE_CLIENT_ID`) can be set as environment variables to override `config.json`. ## API Endpoints When running in webserver mode, the following API endpoints are available: ### `POST /` * **Description**: The main webhook endpoint for receiving events from Gerrit. * **Request Body**: JSON payload from Gerrit. * **Supported Event Types**: * `change-integration-fail`: Triggered when a CI integration fails for a change. * `comment-added`: Specifically listens for comments from `qt_ci_bot` containing "Quick Check: Failed" and a `Verified: -1` approval. This allows re-triggering analysis on quick check failures. * **Response**: * `200 OK`: Always responds with 200 to acknowledge receipt, even if internal processing fails. Errors are logged internally. * `400 Bad Request`: If the event type is not supported. ### `GET /status` * **Description**: Returns current token usage statistics for the OpenAI API. * **Query Parameters**: None. * **Response**: JSON object with token counts. #### Note: sums are not persistent between application startups. ```json { "completionTokens": 1500, "promptTokens": 5000, "totalTokens": 6500, "total_cost": "$0.012" } ``` ### `GET /runAnalysis` * **Description**: Manually triggers an analysis for a specific change or integration ID. This endpoint is primarily for testing and debugging, or if a simple restage recommendation is required. The analysis result is returned directly in the response, and no comment is posted to Gerrit (equivalent to `NO_POST=1` behavior for this endpoint). * **Query Parameters**: * `changeId=`: The full Gerrit change ID to analyze. * Example: `/runAnalysis?changeId=qt/qtbase~dev~I123456789abcdef0123456789abcdef01234567` * `integrationId=`: The COIN integration ID to analyze. If used, `changeId` is not required for fetching logs but the analysis will not include change diffs in context. This provides a more simple analysis of logging only. * Example: `/runAnalysis?integrationId=1234567` * **Response**: * `200 OK`: JSON object containing the analysis result (the same structure that would be posted to Gerrit). * `400 Bad Request`: If `changeId` format is invalid, or if neither `changeId` nor `integrationId` is provided, or if CI was cancelled for the change. * `500 Internal Server Error`: If an error occurs during processing (e.g., cannot fetch integration ID). ## Sample outputs ### ==== Positive correlation ==== Failure Summary: The test `tst_QMessageBox::staticSourceCompat(widget)` failed because the actual return value did not match the expected value. Test Code Snippet (tst_qmessagebox.cpp): ```cpp 483: ret = QMessageBox::information(nullptr, "title", "text", QMessageBox::Yes, QMessageBox::No); 484: COMPARE(ret, QMessageBox::No); ``` Suggested Action: The change caused the failure. The patch modified the behavior of `QMessageBox::showOldMessageBox()` to correctly set the `QMessageBox::Default` flag on the default button. The test expected the `QMessageBox::No` button to be the default, but with the change, the `QMessageBox::Yes` button is now correctly set as the default. The test must be updated to expect the correct default button according to the new behavior introduced by the change. ### ==== Negative correlation ==== Failure Summary: The test `tst_Moc::initTestCase()` failed because the actual standard error output from a process was not empty, while the test expected it to be empty. Test Code Snippet ([tst_moc.cpp]): ```cpp 892: QVERIFY(proc.waitForFinished()); 893: VERIFY_NO_ERRORS(proc); ``` Suggested Action: The change in the diff is not related to the failure. The diff shows a modification in a different test file (`tst_qtableview.cpp`) which is unrelated to the moc test (`tst_moc.cpp`). The failure is likely due to an environmental issue or a change in another part of the codebase. ## Known limitations * As this bot operates change-wise, the only context in an analysis of other changes tested in the same integration are the subject lines of those changes' commit messages. As a result, changes which depend on each other may trigger false-positives or false-negatives due to an individual analysis "not having the full picture" of a developer's intended change which spans multiple patches tested together. * Relevant filename parsing using GPT4 is inconsistent and prone to hallucinations. In many cases the LLM is either not provided enough context in a log snip to identify a full executable name, or simply includes irrelevant file(s) during log analysis. Improving the hit-rate of filename discovery (leading to inclusion of relevant sources in the analysis) would significantly improve analysis for both build and test failures. * As of May 2025, GPT-4o is outdated and higher analysis quality can be achieved with newer models, especially considering higher context limits. ## Further reading * [Statistics on why integrations fail in Qt's CI (according to this bot's analysis)](https://testresults.qt.io/grafana/d/de0ckrnyynwu8d/ci-integration-failure-reasons?orgId=1&from=now-7d&to=now&timezone=browser&var-cl=$__all&var-repo=$__all) * [Qt's Public CI system (COIN)](https://testresults.qt.io/coin/tasks) * [Statistics on Qt's CI flakiness](https://testresults.qt.io/grafana/d/3q5k7mrGz/restaging-statistics?var-average_over=7d&orgId=1&from=now-1y&to=now&timezone=browser&var-Branch=dev&var-repo=qt%2Fqtbase)