Higashi.blog Random notes on cryptography and engineering

I built a Coffee Beans Tracker that does OCR inventory management, with GPT and no code

I brew coffee and I buy a lot of beans. I’m also a lazy person that doesn’t want to type in all the details in one of the available apps and track things to the grams.

Luckily I found out that GPT-4o (free tier) is good enough to take a few photos I uploaded of a bag and parse it’s roaster, varietal, origin, tasting notes, roast date, etc. I tried with a dozen different bags and it works pretty well - even when it makes mistakes I can correct them with natural language conversation, which I enjoyed.

Then I had to figure out storage. I don’t want to rely on the chat history / context as a way to store info as that can easily get lost. I discovered GPT Actions. My immediate thought is to give GPT some access of an online scratchpad that it can read and write and call it a day. That’s how I ended up integrating it with GitHub Gist. I created an empty gist file, and generated an app access token that reads/writes gist files. I hardcoded this token to GPT Actions, (struggled quite a bit on teaching it how to use the gist API), and voila it was able to write stuff to my gist file!

I simply populated some example content in that file - a CSV encoded format of coffee beans with the first row describing the columns that I’m interested in. GPT is then smart enough to figure out read this CSV, parse how much coffee I got to answer queries, as well as append a new bag to the end of CSV, and even sort its entries! It also made some terrible mistakes that I had to manually correct, but these are CSV syntax mistakes and it doesn’t seem to bother GPT at all. Here is a history of changes GPT (and I) made to that gist file if you are interested.

Then I talked about it with friends, and they all of sudden got interested and wanted to try this too. I can’t share my GPT because it hardcodes my API tokens - they will be writing onto my beans list instead of theirs. Moreover, GPT is unreliable, even if I manage to separate users by a specific column, I’m sure my friend can find a way to trick GPT to just wipe the entire gist file and ruin the fun.

Last weekend I was thinking about how to make this multi-tenant. I think GPT Actions attaches some user identifier on each request but I’m not so sure whether it’s reliable. Then I found that you can also do OAuth in GPT Actions instead of a plain API token - ideally this should allow users to log in to their own account and claim their own piece of storage somewhere.

I found a blog post from OpenAI about how to authenticate GPT Actions with AWS Cognito user pool. I decided this is it - I just fired up an AWS account and did the full server less stack: Cognito for oauth/user pool, API Gateway for gating access, Lambda for processing requests and giving GPT read/write to its scratchpad (literally a string that I save) keyed under Cognito user ID, and a DynamoDB that stores the data.

After tinkering with it for 1-2 hours, and I got something working. I can log in / register an account with Cognito, and then GPT gets that persistent scratchpad to do its thing. It worked for me and it worked for my partner’s phone as well. I’m happy that I didn’t do any serious development work throughout the process and GPT was able to approximately do all the things that I intended. I would like to claim that I wrote NO CODE for this, but I had to write a simple get/put adapter the Lambda function, but most of that are copy paste from a tutorial anyways ;)

There you have it. I really like how you can just prototype some idea this easy by giving a GPT a scratchpad. This can be repurposed to do not just coffee management of course. Maybe I can think of other better ideas down the road.

Here is a setup link for who’s interested trying out this new multi-tenant version of my bean tracker. Sorry for being a bit sketchy on everything but the idea is to make sure I don’t do any serious dev work and still achieve the functionality.