20 Sep 2024

OpenAI o1 The Reasoning Model

On September 12, 2024, OpenAI introduced a new model with the release of OpenAI o1-preview and OpenAI o1-mini.

Let’s start by asking what everyone is thinking, what is up with these names? There must be some marketing dollars behind them, but we do not find them particularly descriptive or helpful. Speaking for at least some of us, they are likely to cause confusion generally and, moreover, specifically to neurodivergent individuals. But we digress. Let's share the little we know right right now as the release happened late last week:

What's the difference between the o1 and Chatgpt models?

Reasoning

The hype around this release is that this new family claims to be a “reasoning” model designed to solve more complex problems in science, math, and coding by spending more time thinking before responding.

The black box details remain vague, but the core difference is on how o1 is trained to handle complex reasoning tasks, using what OpenAI calls a "chain of thought" methodology. This allows the model to think more deeply and experiment with different strategies before providing an answer, which improves its ability to handle complex problems. GPT-4o was not designed in this manner and did not perform very well on complex tasks. This development could be extremely helpful, when perfected, to lawyers.

As highlighted in the research materials hyperlinked above, the data shows that this model can handle tasks with the depth and accuracy of PhD-level reasoning, significantly improving performance.

In benchmark tasks such as the International Mathematics Olympiad (IMO) qualifying exam, o1-preview demonstrated its prowess by solving 83% of the problems, a sharp improvement over the 13% success rate of its predecessor, GPT-4o.

OpenAI Sources

On the IMO math benchmark, o1-mini scored 70%, nearly matching the 74% of o1-preview while offering a significantly lower inference cost. It also performed competitively in coding evaluations, achieving an Elo score of 1650 on Codeforces, positioning it among the top 86% of programmers. Significantly cheaper, compared to o1-preview, the o1-mini's target are developers and researchers who require reasoning capabilities but do not need the broader knowledge that the more advanced o1-preview model offers.

2. Security

We also like that OpenAI seems more focused on safety, implementing enhanced training that allows the model to better follow safety rules and resist jailbreaking attempts. In tests, o1 scored significantly higher on safety than GPT-4.

Red teaming involves skilled testers acting as adversaries to try and break the model's safety features by pushing it into producing harmful, biased, or unsafe outputs. Red teaming helps ensure that vulnerabilities are uncovered and addressed before the model is released widely, making it more reliable and secure.

For AI, red teaming might involve tasks such as:

Trying to trick the model into producing harmful, biased, or offensive outputs.

Testing how well the AI follows ethical guidelines when asked to generate inappropriate content.

Pushing the model to generate dangerous misinformation or instructions, like details on illegal activities.

Jailbreaking techniques might involve cleverly worded prompts that bypass filters or exploit gaps in the model’s logic. The model might be tricked into performing tasks against its programming, like revealing personal information or producing content it has been trained not to provide. For example, in OpenAI's testing, they found that GPT-4o, an earlier model, could only score a 22/100 in resisting jailbreak attempts, while the new o1-preview model scored an 84/100, indicating much better performance in maintaining adherence to safety rules even under pressure.

The o1 model also includes collaborations with U.S. and U.K. AI Safety Institutes, highlighting the importance OpenAI places on ensuring that these models are not only more capable but also safer for public use.

3. Initial Impressions

In our initial testing, the model shows possibilities, and we like the fact that it shows you its reasoning steps, so they can be tweaked. But the o1 series is still in "preview," with ongoing improvements and updates. It still lacks, for example, browsing, file, and image uploading. For these uses, ChatGPT 4o is the go-to model, for now.

How can the o1 Model Benefit Legal Practice

Too early to tell, but its enhanced reasoning capabilities makes it potentially more suitable for legal practice than its predecessors.

On that note, legal tech vendors should be considering how and when to integrate this newer model into their offerings.

The legal buyer should continue to be vigilant on the technology their vendors are using and how it has been dialed to work. When evaluating AI tools for legal practice, the vendor checklist should include, for example, AI model type and functionality, security, privacy, and confidentiality, jailbreaking and red teaming, performance and monitoring, integration and usability, compliance and ethical considerations, cost and licensing, governance and accountability, data handling and ownership, future development and roadmap, and ethical and bias considerations.

Before diving into the world of AI-driven legal tech, it’s essential to ensure these foundational issues are addressed and understood.

Conclusion

The shift from ChatGPT-4 to more advanced generative AI models presents both an opportunity and a challenge for the legal industry. The enhanced reasoning capabilities make these tools highly suitable for legal work, offering more in-depth analysis and improved decision-making potential. But law firms must be vigilant about understanding the technology they adopt, ensuring that the AI is secure, reliable, and tested for vulnerabilities.

As legal tech vendors move quickly to bring these new models into their offerings, lawyers must take the time to understand what they’re getting. Don’t be swayed by flashy new features alone—dig deeper into how these tools operate, how they are secured, and how they can truly benefit your practice. Only by balancing innovation with scrutiny can the legal field fully embrace the potential of generative AI, while upholding the ethical standards that define the profession.

Stay tuned for the latest updates as we continue our testing. If you have questions, email our team at info@kartalegal.com.