Building ChatGPT Plugins // BrXnd Dispatch vol. 018

A dive into the mind-blowing world of ChatGPT plugin development.

Apr 19, 2023

You’re getting this email as a subscriber to the BrXnd Dispatch, a (roughly) bi-weekly email at the intersection of brands and AI. On May 16th, we’re holding the BrXnd Marketing X AI Conference. Tickets have gone much faster than expected, and the waitlist is now open. I fully expect to release more tickets, but until I can better grasp all the sponsor and speaker requirements, it’s safest to do it this way. Add your name to the waitlist to be immediately informed when more tickets are available. With that said, sponsor tickets are still available, so if you’d like to sponsor the event, please be in touch.

BLUF: Bottom Line Up Front

(This post is a bit longer and more technical than usual, so here’s a BLUF to help you decide if you want to read it.) ChatGPT plugins are much more interesting than I had understood. Technically, you build them by defining your required input and output, and ChatGPT uses its AI to generate its request to your plugin and also how it formats your plug-in’s response. It’s very much using AI as a fuzzy interface, something I talked about in my Prompt to Rule All Prompts post. Not only that, but the way OpenAI has designed the spec makes me believe they’re trying to create a standard that defines how any website or application will interact with chatbots moving forward. If you want to skip ahead, I have some more takeaways at the end.

On Sunday night, I noticed something different when I signed into ChatGPT. I saw the option to “develop your own plugin” listed below the ten offered by OpenAI. My plugin dev access had arrived, and with it, my plans for Monday had changed.

Of course, having access to build plugins and having an idea of what to make or how to do it is very different. So I reached out to a few friends to see if any had ideas, and Jason Boog suggested I build one to automatically look up YouTube tutorials when you ask ChatGPT for help making something.

That message, which came through at 10:43 am on Monday, sounded fun and pretty straightforward, so I got working. I had heard a bit about how plugins worked, specifically that they had some interesting ways for defining interfaces, but I hadn’t dug in at all. So my first stop was reading the documentation.

The way OpenAI has set defined things is super interesting and significantly different than I’d imagined. Essentially what you do is put a file on your site in a specific place (in this case, yourdomain.com/.well-known/ai-plugin.json) that tells ChatGPT what your plugin does and where to find the details about the API’s URLs and the inputs it expects/outputs it delivers. The JSON is your “manifest” and can be thought of as something not entirely dissimilar to robots.txt: a file that can be placed on a website to describe how that site expects to interact with bots. The difference, of course, is that your ai-plugin.json describes the interaction with a chatbot, whereas robots.txt is for crawlers.

Part of that file points to a different URL where you store an OpenAPI spec for your API. That file is in a language called YAML, which describes configurations like the input and output of APIs. If that’s a bit hard to understand, here’s what it looks like for my simple YouTube app. You can see there are four main components:

Info: This is where you describe the basic details of your plugin
Servers: URL where the plugin lives
Paths: This describes both the ways your plugin expects to receive data from ChatGPT and the URL path it should be sent
Components: This describes the way your plugin will return data back to ChatGPT

Zooming in on #3 Paths, if we look more closely, you can get a sense of how the YAML works. The path “/api/tutorials” is where ChatGPT will send requests for YouTube tutorials, and the parameters describe how they should format the data. There is some code I wrote at the end of that URL that can take a YouTube query, call out to the API, and return a set of videos. Nothing fancy.

BRXND SPONSORS

A huge thank you to the many supportive sponsors who are making the event and my work possible. If you would like to sponsor, please be in touch.

A huge thank you to our sponsors at The Brandtech Group, LinkedIn, Redscout, EZNewswire, Nova, Horizon Big, Otherward, Persistent Productions, McKinney, and The Imaginarium. If you’re interested in sponsoring, please be in touch.

But this bit of YAML is where things start to get a bit mind-bending. In most integrations, an API spec from the system you’re integrating with describes the structure of the data they will send you. For example, when I call the YouTube API they define both the way I must send them data and the format I can expect to receive it. This makes sense: part of the power of being a platform people want to integrate with is that those integrators are willing to bend to your needs and write code according to your specifications.

But that’s not really how ChatGPT plug-ins work—and I can’t say I’ve ever seen anything quite like it. While they have defined a set of specs, the magic of how those specs work is that they allow you—the integrator—to decide how you want ChatGPT to send you data. Again, this is a big shift in the power dynamic that’s made possible because of AI (more on that in a bit). This little bit of YAML does most of the work on the input side of things:

parameters:

- in: query

name: keywords

schema:

type: string

description: Used to search YouTube for tutorial videos.

In that snippet, I am telling ChatGPT that when it sends me a request for YouTube tutorials, it should format that request query as keywords=YOUTUBE+SEARCH+QUERY. The AI then takes that definition and structures the input accordingly. That means they just magically figure out how to deliver data in whatever way the plugin requires. It’s hard to explain how mind-blowing this is to me. Instead of spending a couple hours learning the rules for the platform I wanted to integrate with and bending my code to fit that system, the AI found the best way to do it for me. The magic is in these fuzzy borders between the systems, which allows for some very different ways of working.

You couldn’t really build an interface like this before AI, but the having a large language model that can format text in any way a user needs extends past just asking ChatGPT to write a blog post for you. It also allows the system to format data to meet the requirements of any downstream system. This is crazy!

Ok, enough talk, let’s see the plugin at work. In this video, I’m feeding it a request to give me a cupcake recipe fit for a 12-year-old. After returning the recipe, it recognizes that this might be a good place to include some YouTube tutorials, so it automatically structures a YouTube query and sends it to my plugin. (Again, the AI figured out it was a good time to utilize the plugin based on the description in the YAML, not the user.)

As an aside, I posted this video to that same Slack thread at 12:42: two hours after the original idea. I also had a 30-minute call in there. And most of the time spent wasn’t coding. It was getting myself familiar with the OpenAPI YAML spec.

So far, I have only talked about input, but there is also a definition for the output in that YAML. In my case, I send back a bunch of info about the different tutorials I fetched from YouTube, including title, description, views, and so on. That information is delivered as JSON, not formatted text ready for a user. This is the other place where the power of building things that interface with AI shines through. ChatGPT doesn’t care about what info I include as long as I tell them about it. They also don’t care about the presentation because they will deal with that.

Look at the way the response is formatted here:

I actually gave ChatGPT much more info about the videos, but they decided to simply include the title, length, and description. They took the data I returned in a raw format and turned it into this numbered list. They also inserted the “Feel free to choose a video that you think will be most helpful and enjoyable for the 12-year-old. Happy baking!” This, again, is wild to me. They’re controlling the entire presentation layer.

Right. Hopefully, I’ve done a reasonable job of explaining how it works and why I think it’s much more interesting than I had been expecting. In my quest to build intuition about AI through tinkering, Monday was a good day. Here are a few thoughts and takeaways in no particular order:

The way this API is defined feels much more like a general specification for how websites and applications will interact with AI chatbots than a one-off plugin implementation. The robots.txt metaphor seems apt.
To that end, it will be fascinating to see if others start to pick up on this and look for these same files in the same format to allow other bots to interact with “plugins.” There’s word that Bing will be turning plugins on soon, and presumably, they will use the same approach. But, what about Google when they launch whatever they’re working on? And Apple, when Siri gets the upgrade?
Plugins don’t feel like the right word when you think about things this way. Plugins describe the interaction between the ChatGPT user and the content, but, like I said, this feels more like a framework for describing how applications interact with AIs and less like plugins to me.
There is real magic in this fuzzy interface stuff. AI can be a layer that allows systems to interact in the same language. We’ve seen that with the translation capabilities of large language models, but formatting code both ways in real-time is a different thing entirely.
Controlling the presentation layer in the way ChatGPT does has very interesting implications for advertising. I was trying to articulate some of this in a Twitter conversation yesterday. It’s not that I think they’re going down the ad road, but this plugin model and the fact that ChatGPT (or whoever controls the bot) completely owns the way content is presented creates fascinating opportunities to insert ads. Again, I don’t imagine that is what they’re thinking about, but once you get into interacting with outside websites and applications through AI, the intent signal, particularly around some queries (shopping, cooking, etc.), should be right up there with Google.

Let me know if you have any questions, thoughts, or ideas. Comments, emails, and messages are all welcome. And, of course, see you in May at the BrXnd Marketing AI Conference! It’s coming up fast.

— Noah

PS - If you haven’t joined the BrXnd.ai Discord yet and want to talk with other marketers about AI, come on over.

Building ChatGPT Plugins // BrXnd Dispatch vol. 018

A dive into the mind-blowing world of ChatGPT plugin development.

Discussion about this post