A Brave New World with Old World Challenges

Written by Hubspot User | Mar 27, 2025 4:45:41 PM

The best-laid plans of mice and men oft' go awry ...and leave us nothing but grief and pain, For promised joy!

Robert Burns, 1785

This line in Robert Burns' poem refers to the apology of a farmer to a mouse, whose nest the farmer has inadvertently destroyed during the autumn harvest. It inspired the book of the same name by John Steinbeck in 1937 and several films. But it might just as well apply to the application of elegant technological advances to the practical problems and constraints of modern business.

Solving Practical Problems: Contract Analytics

At Seal, a contract analytics vendor (the first!) we formed in 2010 and sold to DocuSign in 2020, our team was at the forefront of using ML (machine learning) to solve practical problems in some of the largest companies in the world. A few years before the sale, my team was struggling to enable GANs (Generative Adversarial Networks) to auto-learn contract text, with the aim to enable them to predict and eventually provide zero or one-shot learning for new topics within contracts or law as a subject.

The Challenge of Convergence

GANs comprise two deep neural networks---the generator network and the discriminator network. Both networks train in an adversarial game, where one tries to generate new data and the other attempts to predict if the output is fake or real data. GANs converge when both networks are able to learn to recognize patterns in the data that allow them to produce realistic samples.

Our project at seal involved manipulating vast quantities of information and countless hours of training and head-scratching - frustration that will be understood by many working within the ML field, boiling down to three questions:

"How/why did you do that?"
"Why don't you work like you should after all the time we gave you?"
"Why will you not converge??"

The last one is the point of this blog---why the networks would not converge and how and why this is relevant today with LLMs (Large Language Models). As many will know, LLMs are not GANs, but they have the same overall challenges as GANs when it comes to convergence, or in the case of the LLM, finding the right answer or path through the network. Errors here give rise to what is commonly referred to as 'hallucinations'; misleading, spurious, or apparently invented answers.

Starting Points: The Key to Network Convergence

One of the things we found in the years doing the work and research at Seal on the GANs was that the network would either converge or not based on the simplest of factors---its random starting point. When looking to train a system, you need to provide it with an example and counter examples to start with and then, based on the configuration, compete to either predict correctly or fool the other model. And in doing so, gain a view of what is expected and eventually converge to a stable and usable model for the prediction of new information.

But herein lies the rub: the only way to do this when you are dealing with millions of files and data points is to randomly generate start points, unless you have reviewed all the data in totality and can produce an algorithm to select the best starting point and example in each case.

This random starting point was the principal factor determining whether a network would converge or fail, much like the famous Drake equation. In the Drake equation, the final two factors---particularly L (the length of time a civilization will continue to transmit)---are critical in deciding whether the outcome is billions of potential extraterrestrial civilizations or just one. Similarly, in GANs, no matter how much effort is put into training and configuring the model, the success or failure often hinges on the random starting point, making much of the preceding work feel almost irrelevant!

LLMs: Still Reliant on Starting Points

How does this relate to where we are now, a few years after those early tests and failures? Well, much like GANs, LLMs are highly reliant on the starting point within the network. As each path in the network is derived from the starting point, and each starting point has many paths of probability (in most cases), some other paths can be dropped or missed if the starting point is sub-optimal.

You can see the effect of this by asking the same question of the same data many times to the same LLM; it can and will give different answers. This is because it's working from a starting point and then probabilities on predictions of the next and correct path or word through the network.

Optimising Starting Points: Game-Changing for ML Models

Much like when testing with GANs, providing the best starting point is game-changing. But how can you do that? The first approach is to give the LLM examples to work with. Unlike GANs, LLMs benefit from detailed descriptions or "chain of thought" methods. One-shot learning, where the model is provided with a single example, is more effective than zero-shot learning, where no examples are given. Alternatively, you can offer a larger, more detailed prompt that provides additional context and information.

The Future: A Better Way

But is there a better way? The answer to that question is to be found in what we are building and creating within Digital Mirror. It requires a new way to map all the information within organisations and automated methods of labelling it. The days of training models on data lakes and digital oceans are over. Now we must travel through the data desert, stopping at oases for better, smaller, and more accurate predictive language models.

In the next post, I will talk about the next stage of the revolution and how, after living the same hardware and software convergence with machine learning and contract analytics, we start to see a picture of what the future holds and what challenges this entails for enterprises.

View full post