ChatGPT has blasted into our consciousness over the last 2 months. It has created an equal measure of excitement as people see the opportunities for leveraging their capabilities, and dismay at the problems they see being created.
Both are right, but if we are to make judgements about which side of that fence we choose to sit, it makes sense to understand a little bit about how it works.
These AI tools work on letters, and groups of letters, which then make up words, and the probability of one letter following another, and then another, and then one word following another, and another.
There are about 40,000 commonly used words in English, and billions of words published. From this database computation can give you the probability of a letter following another, eg. The probability of a U following a Q is very high, the probability of a V Following an L is low. This probability logic is extended to groups of 3, 4, 5 letters, one calculation of probability at a time. The outcomes of those cascading probability calculations transforms letters into groups that make up words based on the text used to ‘train’ the software.
Many words have multiple meanings, depending on the context in which it is used, homonyms. Sometimes the spelling is different, but they sound exactly the same. We understand what is meant by the context in which the word appears. For example: if I said, ‘I am on leave’ everyone knows I am on holiday. By contrast if I said, ‘I am going to leave’, it means I am about to depart whatever event we were at. I might also leave something for you at the door.
The word ‘leave’ is spelt and pronounced exactly the same way every time, it is the context in which it is used that makes the difference.
The juxtaposition of words also makes a difference to our understanding. If you remember your primary school grammar, it is all about the position of the subject and the verb.
If I was to say: ‘I am going to leave the party‘ the subject, object, and the verb are in the correct position in English for easy understanding. If I was to say ‘the party I am going to leave‘, most would understand, but would be expecting me to say more, despite the words being identical, it is just the position that changed.
Linguists have studied these relationships for years. Their mantra is: You will understand a word by the company it keeps.
If you take this to its logical extreme, the position of every word in a body of text has an impact on the understanding of every other word, and group of words in the same body.
If the surrounding text to my sentence is about going to a friend’s place for a drink, that will lead to a probability that the ‘party’ has to do with a social event. On the other hand, if the surrounding words were about politics, the phrase ‘I am leaving the party’ takes on a completely different meaning. All these considerations are taken into account by the magic of the probability of me leaving the party when the words friends and drinks are in the surrounding copy. Should those surrounding words be government, and policy, it is more likely the party I am leaving would be a political one.
The operating system of Open AI, and others, have scraped the web for all text published, and stuck it into what amounts to a huge multidimensional spreadsheet. The machine calculates the probability of any one letter appearing after another, then any word appearing next to another based on the occurrences of those letters and words and groups of letters and words in the scraped text. It does this over and over again, spreading the web of probabilities of words and groups of words appearing together, in a particular order, wider and wider, one word at a time, across the body of copy.
This process is extraordinarily computationally intensive. It is hugely expensive to build and program machines that can do these enormous sets of calculations on this amount of text.
If you give such programs a general brief, the best it can do is return a general response. The more detailed you can make the brief, the more explicit the context, the better the machine will be able to use probability to find that combination of words that best matches your requirements, then spit out a response to you.
As a marketer, you understand that when giving a creative brief to an ad agency, the more detail you can give the creatives, the more relevant will be the creative responses. A general brief will give you lots of ordinary creative responses. By contrast, a detailed brief that clearly articulates the target market, product benefits, and the value to be derived from the products use, will generate better creative responses.
ChatGPT is no different, so for good results, give it a good brief.
What makes this so powerful for those who are expert in their domains, is that they will be able to give better briefs, and so have returned better results, which will then be the basis of their creative thinking. This offers the opportunity to improve on the best that has been done to date. For those who are not as expert, their briefs will not be as good, the context in which the machine defines probabilities will be wider, so the output more general, generic, average, and average these days increasingly simply does not cut it.
I hope that helps.
For a more detailed and technical explanation of how ChatGPT works written by an expert, go to the fifth PS at the end of this blog post published when I first stumbled across ChatGPT in December last year.
Header Credit: Dall-E. The brief was ‘ChatGPT algorithms working hard to compute copy in a surreal setting’