J'ai testé Opus 4.6 et Codex 5.3... Le résultat m'a CHOQUÉ

TLDR;

This video compares the performance of Opus 4.6 and GPT 5.3 Codex in various coding tasks. It covers initial reactions to Opus 4.6's release, focusing on its increased context window and benchmark results. The video includes tests ranging from simple application creation to complex feature implementation, assessing each model's speed, code quality, and user experience.

Opus 4.6 has an increased context window of 1 million tokens.
The models are tested on tasks like Next.js app creation, 3D animation with Three.js, and voice chat integration.
Codex excels in complex feature implementation, while Opus shows speed and style advantages in other tests.

Introduction [0:00]

The video introduces the release of Opus 4.6, highlighting its updated benchmarks and a significant increase in context window to 1 million tokens. The presenter outlines a series of tests designed to compare Opus 4.6 against GPT 5.3 Codex, including application development and feature implementation. The goal is to evaluate the models' performance and capabilities in real-world coding scenarios.

Nouveautés et benchmarks IA [1:00]

The presenter reviews social media reactions to the Opus 4.6 release, noting positive feedback on its smart testing upgrades. Benchmark comparisons reveal that Opus 4.6 outperforms previous versions in Terminal Bench but shows a slight decrease in SWE bench verified scores. A key highlight is Opus 4.6's ability to manage multiple tasks and teams autonomously, signaling advancements in AI-driven project management.

Analyse du long contexte retrieval [3:15]

The most significant update for Opus 4.6 is its expanded context window of 1 million tokens, enhancing its ability to retrieve and utilize information from large contexts. This addresses a major limitation of previous Opus versions, allowing for more comprehensive reflection and context usage in complex tasks. The presenter emphasizes the importance of this feature for improving workflow and handling extensive data.

Présentation des tests comparatifs [4:45]

The video outlines a series of comparative tests between Opus 4.6 and GPT 5.3 Codex. These tests include creating a Next.js application, solving math and physics problems, generating 3D animations with Three.js, and implementing features in a chat application and dashboard. The presenter emphasizes that these tests are designed to assess the models' practical coding abilities and efficiency.

Test 1 Application Next.js [6:00]

In the first test, both models are tasked with creating a Next.js application for generating YouTube thumbnails. Opus 4.6 produces a consistent, minimalist design, while GPT 5.3 Codex generates an application with inconsistent spacing and shadows. Opus 4.6 also creates a better prompt for generating thumbnails, leading to a more usable result. Despite some minor issues, Opus 4.6 is deemed the winner of this test due to its superior style and prompt adherence.

Test 2 Mathématiques et physique [10:30]

The second test involves creating an HTML file with a bouncing ball inside a rotating heptagon. Both models successfully generate the animation, but GPT 5.3 Codex initially outputs the code as text. After modifications, both models produce functional animations, with GPT offering slightly more dynamic bouncing. Opus 4.6 is favored for its faster execution and cleaner interface, earning it another point in the comparison.

Test 3 Animation 3D avec Three.js [15:45]

The third test challenges the models to create a 3D animation of SpongeBob's world using Three.js. GPT 5.3 Codex initially struggles, outputting the 3D code within the chat interface instead of a file. Opus 4.6 successfully generates a visually appealing 3D scene with detailed elements. Despite some initial color issues, Opus 4.6's superior output and faster execution secure another win.

Analyse des résultats Three.js [21:30]

A detailed analysis of the 3D animation test reveals that Opus 4.6 created a more visually complete and stylistically accurate representation of SpongeBob's world. GPT 5.3 Codex, even after corrections, produced a less detailed and aesthetically pleasing scene. The presenter highlights Opus 4.6's ability to better respect the desired style and deliver a functional result more efficiently.

Test 4 Intégration chat vocal [23:15]

The fourth test involves integrating voice chat functionality into an existing chat application. GPT 5.3 Codex initially takes over 30 minutes and requires auto-compaction due to context limits but eventually delivers a functional implementation with a clean interface. Opus 4.6, while implementing the core functionality, lacks some UI elements and has a less polished user experience. GPT 5.3 Codex is favored for its superior interface and overall implementation quality.

Comparaison des performances de code [27:15]

A comparison of the code generated by both models for the voice chat integration reveals that GPT 5.3 Codex provides a more refined and user-friendly interface. Opus 4.6, despite achieving functional parity, falls short in UI design and user experience. The presenter notes that GPT 5.3 Codex's ability to deliver a complete solution in fewer prompts demonstrates its strength in complex feature implementation.

Test du widget et transcription [30:30]

Testing the widget integration and transcription capabilities of both models shows that GPT 5.3 Codex successfully implements these features with minimal issues. Opus 4.6, while managing to transcribe audio, struggles with widget implementation and UI consistency. The presenter emphasizes the importance of a seamless user experience and gives GPT 5.3 Codex credit for its superior performance in this area.

Expérience utilisateur et interface [35:00]

An assessment of the user experience and interface design highlights GPT 5.3 Codex's strengths in creating intuitive and visually appealing interfaces. Opus 4.6's UI is described as less polished, with inconsistent design elements. The presenter notes that while both models achieve functional goals, GPT 5.3 Codex excels in delivering a more user-friendly and aesthetically pleasing experience.

Conclusion et avis final [38:00]

In conclusion, the presenter summarizes the test results, noting that Opus 4.6 received three points while GPT 5.3 Codex received one. Despite Opus 4.6's overall lead, GPT 5.3 Codex demonstrates impressive capabilities in complex feature implementation and user interface design. The presenter expresses a strong dislike for Codex's UX, but acknowledges its ability to deliver high-quality code in specific scenarios. The video ends with a call for viewers to share their thoughts and experiences with the models.