Generative AI on the Cheap

Getting started with Generative AI can be daunting. It's hard to know where to begin, and there are plenty of options, each with its pros and cons. This brief article will highlight some of the paths you can take and also show you a quick and easy (and virtually free) approach to get you up and running with Generative AI in your applications.

AI is Expensive

Let's face it, the Average Joe, or the Average Joe Company isn't going to be developing AI, generative or otherwise, anytime soon. It's just too expensive. By some accounts, each training run for ChatGPT 3 — the darling of Generative AI — costs roughly $5M for the GPUs (Graphics Processing Unit) alone, and some estimates are considerably higher. The total cost to train such models is believed to be more than $100M. And that's not even including development and R&D costs and all the other things that go into running a company. (Curiously, I asked ChatGPT for an estimate and it declined to provide one.)

The real problem isn't the underlying neural networks that provide the foundational technology. Those have been around for quite some time. It's the mountains of training data needed to build an accurate model. The advances we've seen in Generative AI are driven more by increasingly available computational power and the mountains of accessible training data available on the Internet than any one specific advance in neural network technology.

If you train a neural net on anything but a very large data set your results are not likely to be satisfactory. I once trained a TensorFlow NN on a few thousand song charts in hopes of creating a song generator and the result was a musical disaster, albeit an entertaining one. On the other hand, a simple model using n-grams (2-grams, specifically), trained on that same data set, yielded some very nice, production-worthy, musical results. Sometimes, simple works just fine. But not for Generative AI.

How About a Pre-Trained Model?

But don't despair, you've still got options. Perhaps you can leverage a pre-trained model, and there are plenty to choose from — especially if you are comfortable with Python, the most popular language for delivering these pre-fab AI tools. Sources for pre-trained models include Hugging Face and Kaggle. One particularly interesting example is Google's Gemma model, found on Kaggle, which provides the backbone for Google's Gemini Generative AI tool. The Gemma model is roughly 1.4G, a manageable size.

Once we've selected a pre-trained model, if we're lucky we can use it as is. If not, perhaps we can "tweak" the model to be more effective for our particular problem by fine-tuning it using our data. Here's a nice example showing how to do that for the motivated. And by "motivated" I mean "highly motivated". It's not for the faint-hearted. But if you've got a sizable team of smart engineers it's an option.

While 1.4G will fit on most developers' laptops, a very nice convenience, you'll still need some relatively powerful machines to run your customer's queries once you've finished development and deployed your product. We're likely still in the affordable range for the Average Joe Company, but probably not for the Average Joe.

Naturally, we should investigate the plethora of AI tools at our fingertips on the Cloud. AWS, GPC, and Azure all provide powerful product offerings. Brace yourself though, as these tools typically require high-end machines, most of which don't qualify as "on the cheap".

But before you race down the pre-trained path take note — even big, powerful companies with plenty of engineers, like Salesforce, often partner with domain experts to build their customized products. Salesforce's Einstein GPT, for example, allows customers to pipeline their data into OpenAI's ChatGPT to do all kinds of nifty things: customized help, developing marketing insights from customer data, and helping author emails and other communications. If you've got the cheddar and want to get a new product to market quickly, it pays to team up with the experts. If a company like Salesforce needs expert help, perhaps you do too.

Let's Do It

But never mind, we're here to forgo all that and get something going quickly, for the Average Joe, and most importantly, on the cheap. Both Google and OpenAI offer APIs that are "free to use", at least for now. The biggest downside to both is that they are severely rate-limited, unless, of course, you're willing to pay. Gemini is limited to 60 requests/minute and ChatGPT 3.5 is limited to a measly 3 requests/minute. Easy choice here, let's start with Gemini.

Let's see how we can add Gemini to Gizmo CMS, a free "personal" CMS I use as a test ground for new ideas. Gizmo provides its functionality via a plugin-based architecture and it runs entirely in the browser as a Progressive Web App. Because Gizmo is free, it needs to be cheap, meaning minimal — preferably zero — server-side components. So, our job here is to create a Gizmo plugin and connect it to Gemini's free API with both minimum effort and cost.

The essence of our Average Joe effort is to connect our plugin to the Gemini API. The total time to get this going was less than a day, and we can avoid half of that day with a tip to avoid a tricky "Missing Authentication Token" error from AWS's API Gateway.

First, sign up for a Gemini API key. Once you've got that in hand, try a curl command from your terminal to make sure it works:

curl https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key=<YOUR_API_KEY> -H 'Content-Type: application/json' -X POST -d '{"contents": [{"parts":[{"text": "What is the capital of the united states?"}]}]}'

You'll know if the answer is right. Remarkably, we're almost there, at least as far as the Gemini API is concerned. Cheap so far. While the curl POST command above works great from your laptop's terminal, it won't work in the browser, due to the browser's "CORS" (Cross-Origin Resource Sharing) constraint, which is familiar to all developers.

The browser is an amazing tool — we all use it every day to safely (more or less) surf the Internet. And behind that safety are a whole lot of built-in security features, and CORS is one of them. Essentially, it means the browser can't talk to any domain except the one that delivered our PWA application. So we need a way to "host" the Gemini API from our domain, gizmocms.com, so the browser can talk to it.

The AWS Cloud provides a ready-made solution for us via its API Gateway, allowing us to proxy our browser's API call over to the Gemini API, and pass the results back to the browser. There are other simple-ish cloud-based solutions (e.g., create a proxy server via Lambda), but they have their challenges, and this approach is by far the easiest, cheapest, and quickest.

There are plenty of videos and articles on the Net to show you how to do it, so there is no need to repeat that here — for example, try this one. Once finished, we'll end up with a URL to our proxied API that looks like this:

https://<random-characters>.us-east-1.amazonaws.com/prod

We can replicate the curl POST command shown previously using the browser's javascript fetch function, and swap out the URL in the curl command with the URL above. Be sure to whitelist your domain in the AWS API Gateway via the CORs header (a configuration setting) so your app can POST to this URL:

Access-Control-Allow-Origin: <your-domain>

Now, chances are, you'll get a response like Missing Authentication Token. The API Gateway, unfortunately, uses this message as a catch-all for lots of possible problems, so it's very misleading. When you start Googling for a solution, you'll find countless remedies that probably won't work (but, try them anyway to make sure you've covered all the bases). Now try this: add a random character to the end of the URL path above like this:

https://<random-characters>.us-east-1.amazonaws.com/prod/x

Voila! It seems the API Gateway requires something at the end of the URL path to serve as a {proxy+} path variable (this will make sense once you go through the process of setting up your proxy on AWS).

Build the Gizmo Plugin

Our last step is to build something that can incorporate our new Generative AI endpoint. I built a Gizmo plugin that lets me select the content I want to send to Gemini embedded inside a query via a substitution variable (${content}), and then write the result back into another selected content field—details and an example here. The final result is shown below.

Gizmo Gemini Plugin

And that's it. Rate limited, but free, and good enough for now. Thanks for reading and I hope you'll give it a try in your projects.