#### Copyright (C) 2025 The Qt Company Ltd.
##### SPDX-License-Identifier: LicenseRef-Qt-Commercial OR LGPL-3.0-only OR GPL-2.0-only OR GPL-3.0-only

# Qt CI Analysis Bot

## What is this?
This application is an AI-powered tool designed to analyze Continuous Integration (CI) failures for Qt projects hosted on Gerrit. It leverages Azure OpenAI's GPT-4 model to provide insights into why a CI run might have failed in relation to the code changes submitted.

## Goal
The primary goal is for GPT-4 to determine the relevance of a test failure (or other CI errors) against the changes made to the source code in a given Gerrit change. If the change is deemed to have caused the failure, GPT-4 is further tasked with suggesting a possible failure mode and, where applicable, pinpointing relevant code sections.

## How it Works
The bot operates through the following general workflow:

1.  **Trigger**:
    *   It can be run in standalone mode for a specific change.
    *   It can run as a webserver, listening for webhook events from Gerrit. Supported events include `change-integration-fail` (Custom event for Qt's gerrit) and specific `comment-added` events (e.g., "Quick Check: Failed" by `qt_ci_bot`).

2.  **Data Collection**:
    *   **Integration ID & Tested Changes**: For a given Gerrit change, it first identifies the relevant COIN (CI system) integration ID and the list of changes tested in that integration run.
    *   **Failure Logs**: It retrieves the failure logs from COIN for the identified integration ID.
    *   **Log Analysis (Initial)**: The logs are processed to extract error snippets. This involves:
        *   `logTools.js`: Parsing test failures (e.g., `FAIL! : tst_MyClass::myFunction()`) or analyzing log chunks for general errors using AI (`aiAnalyzeLogChunk`).
        *   `aiTools.js`: AI-assisted snipping (`aiSnipTestFailure`) to refine error context.
    *   **Gerrit Data**:
        *   `gerritTools.js`: Fetches the complete diff of the primary change from Gerrit.
        *   Optionally, it attempts to fetch the source code of the failed test(s) if identified.
    *   **Flaky Test Check**:
        *   `dbTools.js`: Queries a PostgreSQL database to determine if the failed test(s) are known to be flaky.

3.  **AI Analysis (GPT-4)**:
    *   `aiTools.js` (`aiAnalyzeCollectedData`): All collated data (log snippets, error summaries, change diff, test source code snippets, flaky test info, platform identifier, list of tested changes) is assembled into a prompt.
    *   This prompt is sent to Azure OpenAI (GPT-4) for analysis. The AI is asked to determine if the primary code change caused the failure and to provide a rationale.

4.  **Reporting**:
    *   **Gerrit Comment**: The analysis result from GPT-4 is posted as a comment on the Gerrit change.
    *   **Failure Classification**:
        *   `aiTools.js` (`aiGuessFailClassification`): The AI's output is further processed to classify the failure type (e.g., infrastructure, real build failure, flaky test).
        *   `dbTools.js`: This classification is written to an InfluxDB database for metrics and tracking.

5.  **Queueing**:
    *   When running as a webserver, incoming requests for analysis are queued per integration ID to process them sequentially and avoid overwhelming downstream services or rate limits.

## Running the Application

### Prerequisites
*   Node.js and npm.
*   Run `npm install` to install dependencies listed in `package.json`.

### Configuration
The application requires configuration for various services:
*   Gerrit (URL, credentials)
*   Azure OpenAI (Client ID, Tenant ID, Client Secret, Deployment, API Version)
*   PostgreSQL (for flaky test database)
*   InfluxDB (for failure classification metrics)

Configuration is primarily managed through a `config.json` file. A template `config.json.template` is provided. Copy this template to `config.json` and fill in the necessary credentials and endpoints.

Values in `config.json` can be overridden by environment variables (e.g., `GERRIT_URL` environment variable will take precedence over `GERRIT_URL` in `config.json`).

### Standalone Mode
To analyze a single change without running the webserver, execute:
```bash
node main.js <repo~branch~changeId>
```
Example:
```bash
node main.js qt/qtbase~dev~I123456789abcdef0123456789abcdef01234567
```
The analysis result will be printed to the console, and if not disabled, a comment will be posted to Gerrit.

### Webserver Mode
To run the application as a webserver that listens for Gerrit webhooks:
```bash
node main.js
```
The server will start and listen on the port defined by `WEBHOOK_PORT` in `config.json`.

### Environment Variables
*   `LOG_LEVEL`: Sets the logging level (e.g., `info`, `debug`). Defaults to `info`.
*   `NO_POST`: If set to `1`, the bot will not post comments to Gerrit. This is useful for testing.
    ```bash
    NO_POST=1 node main.js qt/qtbase~dev~I123456789abcdef0123456789abcdef01234567
    ```
*   Service-specific credentials (e.g., `GERRIT_USER`, `AZURE_CLIENT_ID`) can be set as environment variables to override `config.json`.

## API Endpoints
When running in webserver mode, the following API endpoints are available:

### `POST /`
*   **Description**: The main webhook endpoint for receiving events from Gerrit.
*   **Request Body**: JSON payload from Gerrit.
*   **Supported Event Types**:
    *   `change-integration-fail`: Triggered when a CI integration fails for a change.
    *   `comment-added`: Specifically listens for comments from `qt_ci_bot` containing "Quick Check: Failed" and a `Verified: -1` approval. This allows re-triggering analysis on quick check failures.
*   **Response**:
    *   `200 OK`: Always responds with 200 to acknowledge receipt, even if internal processing fails. Errors are logged internally.
    *   `400 Bad Request`: If the event type is not supported.

### `GET /status`
*   **Description**: Returns current token usage statistics for the OpenAI API.
*   **Query Parameters**: None.
*   **Response**: JSON object with token counts.
    #### Note: sums are not persistent between application startups.
    ```json
    {
      "completionTokens": 1500,
      "promptTokens": 5000,
      "totalTokens": 6500,
      "total_cost": "$0.012"
    }
    ```

### `GET /runAnalysis`
*   **Description**: Manually triggers an analysis for a specific change or integration ID. This endpoint is primarily for testing and debugging, or if a simple restage recommendation is required. The analysis result is returned directly in the response, and no comment is posted to Gerrit (equivalent to `NO_POST=1` behavior for this endpoint).
*   **Query Parameters**:
    *   `changeId=<repo~branch~changeId>`: The full Gerrit change ID to analyze.
        *   Example: `/runAnalysis?changeId=qt/qtbase~dev~I123456789abcdef0123456789abcdef01234567`
    *   `integrationId=<integrationId>`: The COIN integration ID to analyze. If used, `changeId` is not required for fetching logs but the analysis will not include change diffs in context. This provides a more simple analysis
    of logging only.
        *   Example: `/runAnalysis?integrationId=1234567`
*   **Response**:
    *   `200 OK`: JSON object containing the analysis result (the same structure that would be posted to Gerrit).
    *   `400 Bad Request`: If `changeId` format is invalid, or if neither `changeId` nor `integrationId` is provided, or if CI was cancelled for the change.
    *   `500 Internal Server Error`: If an error occurs during processing (e.g., cannot fetch integration ID).

## Sample outputs

### ==== Positive correlation ====

Failure Summary:
The test `tst_QMessageBox::staticSourceCompat(widget)` failed because the actual return value
did not match the expected value.

Test Code Snippet (tst_qmessagebox.cpp):
```cpp
483:     ret = QMessageBox::information(nullptr, "title", "text", QMessageBox::Yes, QMessageBox::No);
484:     COMPARE(ret, QMessageBox::No);
```

Suggested Action:

The change caused the failure. The patch modified the behavior of `QMessageBox::showOldMessageBox()`
to correctly set the `QMessageBox::Default` flag on the default button. The test expected the
`QMessageBox::No` button to be the default, but with the change, the `QMessageBox::Yes` button is
now correctly set as the default. The test must be updated to expect the correct default button
according to the new behavior introduced by the change.

### ==== Negative correlation ====

Failure Summary:

The test `tst_Moc::initTestCase()` failed because the actual standard error output from a process
was not empty, while the test expected it to be empty.

Test Code Snippet ([tst_moc.cpp]):
```cpp
892:     QVERIFY(proc.waitForFinished());
893:     VERIFY_NO_ERRORS(proc);
```

Suggested Action:

The change in the diff is not related to the failure. The diff shows a modification in a different
test file (`tst_qtableview.cpp`) which is unrelated to the moc test (`tst_moc.cpp`).
The failure is likely due to an environmental issue or a change in another part of the codebase.

## Known limitations
*   As this bot operates change-wise, the only context in an analysis of other changes tested
    in the same integration are the subject lines of those changes' commit messages. As a result,
    changes which depend on each other may trigger false-positives or false-negatives due to an
    individual analysis "not having the full picture" of a developer's intended change which spans
    multiple patches tested together.

*   Relevant filename parsing using GPT4 is inconsistent and prone to hallucinations. In many cases
    the LLM is either not provided enough context in a log snip to identify a full executable name,
    or simply includes irrelevant file(s) during log analysis. Improving the hit-rate of filename
    discovery (leading to inclusion of relevant sources in the analysis) would significantly
    improve analysis for both build and test failures.

*   As of May 2025, GPT-4o is outdated and higher analysis quality can be achieved with newer
    models, especially considering higher context limits.

## Further reading
*   [Statistics on why integrations fail in Qt's CI (according to this bot's analysis)](https://testresults.qt.io/grafana/d/de0ckrnyynwu8d/ci-integration-failure-reasons?orgId=1&from=now-7d&to=now&timezone=browser&var-cl=$__all&var-repo=$__all)

*   [Qt's Public CI system (COIN)](https://testresults.qt.io/coin/tasks)

*   [Statistics on Qt's CI flakiness](https://testresults.qt.io/grafana/d/3q5k7mrGz/restaging-statistics?var-average_over=7d&orgId=1&from=now-1y&to=now&timezone=browser&var-Branch=dev&var-repo=qt%2Fqtbase)