Opus 4.5 DÉTRUIT GPT 5.2 sur ces 3 tests (preuves)

TLDR;

This video compares the performance of GPT 5.2 and Opus 4.5 in three coding tasks using Open Code: creating a YouTube thumbnail generator, debugging a complex email capture page issue, and updating a subscriber dashboard design. Opus 4.5 generally outperforms GPT 5.2 in terms of speed, instruction following, and problem-solving, while GPT 5.2 sometimes delivers better UI design but struggles with complex debugging and can be lazy in creative tasks.

Opus 4.5 is faster and more reliable in following instructions and debugging.
GPT 5.2 can produce visually appealing UIs but struggles with complex logic and thoroughness.
The choice between the two depends on the specific task requirements, with Opus favored for complex problem-solving and GPT for simpler, visually-oriented tasks.

Introduction [0:00]

The video introduces GPT 5.2, highlighting claims that it surpasses Opus 4.5 in performance. The presenter aims to test this assertion using Open Code to ensure a fair comparison, focusing solely on the models' capabilities without the influence of wrappers or tools. Three tests will be conducted: creating an application, debugging complex code, and assessing creativity in design updates.

Test 1 Création application [1:00]

The first test involves creating a YouTube thumbnail generator. Opus completed the task in 4 minutes, while GPT took almost 9 minutes. The presenter specified "Do not ask any question, any warning, just process all the command without any human validation" in the prompt. Opus followed these instructions, but GPT asked questions during the process, indicating a failure to adhere to the prompt.

Analyse des performances GPT [2:15]

GPT 5.2 took 9 minutes to complete the task, while Opus 4.5 only took 4 minutes. GPT 5.2 asked questions during the process, despite being instructed not to, while Opus 4.5 followed instructions correctly. GPT 5.2 consumed 27,000 tokens, less than Opus's 35,000, and has a larger context window.

Test de interface utilisateur [4:15]

The presenter launches both applications to compare their outputs. GPT 5.2's interface is described as cool but not very minimalist, featuring a dark/light theme option and a spaced-out style. The thumbnail generation functions well, producing a visually appealing result with a loading state. Opus 4.5 presents a more minimalist design, aligning better with the prompt's request for a "clean, mini, and midjourney-inspired" UI.

Comparaison du design minimaliste [7:15]

Opus 4.5's design is more minimalist, aligning with the prompt's instructions for a clean and simple UI inspired by Midjourney. While GPT 5.2's interface is visually appealing with more features, it is considered complex and not truly minimalist. The drag-and-drop functionality works well in both versions.

Résultat du test 1 [9:30]

The presenter finds GPT 5.2's interface more stylish and user-friendly, but Opus 4.5 better adheres to the prompt's minimalist design requirement. Both models successfully created a functional thumbnail generator. GPT 5.2, despite taking longer, delivered a very good result and followed instructions well, though it occasionally failed to complete the feature in other sessions.

Test 2 Debug complexe [11:00]

The second test involves debugging a complex issue on an email capture page where a recent change caused infinite loading. GPT 5.2 took 24.3 minutes but failed to resolve the problem, modifying a proxy file incorrectly. Opus 4.5, on the other hand, took only 3 minutes and 44 seconds and successfully fixed the bug by updating a different file.

Analyse des erreurs de logique [13:15]

GPT 5.2 took 24 minutes and used 52,000 tokens but failed to fix the bug, while Opus 4.5 resolved it in under 4 minutes. The presenter notes the complexity of the bug, involving a Wasm file build for server-side React code on Next.js. The prompt instructed the models not to ask questions and to resolve the issue, but GPT 5.2 failed completely.

Résolution du bug et performance [15:15]

Opus 4.5 successfully resolved the bug in 3 minutes and 44 seconds, while GPT 5.2 failed after 24 minutes. Opus was able to identify the commit that introduced the bug by reviewing the commit history, while GPT did not utilize the tools effectively to find the root cause. The presenter emphasizes that Opus's ability to follow instructions and solve complex problems makes it superior in this test.

Test 3 Créativité et design [18:15]

The third test assesses creativity by tasking the models with updating a subscriber dashboard. Codex (GPT 5.2) completed the task in 2 minutes, while Claudopus (Opus 4.5) took 6 minutes. The goal was to refactor the dashboard to align with the style of other pages, using provided URLs as references.

Comparaison finale des modèles [21:30]

GPT 5.2 modified the layout, breaking the page's responsiveness and only adjusting spacing, which the presenter deemed lazy and insufficient. Opus 4.5, however, refactored the dashboard, compacting the interface and updating components to maintain visual consistency with the rest of the application. Opus provided a detailed explanation of its design choices and demonstrated a better understanding of the task requirements.

Conclusion et abonnement [24:00]

Opus 4.5 demonstrated superior creativity and problem-solving skills compared to GPT 5.2 in the dashboard update task. GPT 5.2 was faster but lazy, failing to implement significant changes or adhere to the prompt's intent. The presenter concludes that while GPT 5.2 is cheaper, Opus 4.5 is a better model overall, especially for complex tasks. The presenter encourages viewers to subscribe to the channel.