Introduction
Successful projects need to be planned, and they need a way to be measured. If you want yours to succeed, you have to give it the time, resources, and attention it needs—and you have to know what success looks like. In the rest of this blog, we’ll get into why that’s just as true for AI projects as it is for engineering ones.
Understanding the Challenges of LLM Implementation
A friend of mine recently shared details of a new project they were asked to lead at work. (This conversation didn’t actually happen—but for the sake of this blog, let’s pretend.)
The idea was great: use a large language model (LLM) to automate a process that normally takes skilled professionals a full day to complete. The LLM was expected to handle the task in seconds. On paper? Sounds impressive.
So, I asked the obvious questions.
Me: What training data do you have?
Them: A handful of examples.
Me: How much dedicated time do you have?
Them: I’ll be doing it alongside my regular job.
Me: What’s the budget?
Them: There isn’t one.
At that point, I did what I’ve seen a million plumbers do in a sitcom—I sucked air through my teeth and said, “Hmm… good luck.”
Don’t get me wrong—my friend is smart and could probably do a great job on this project, but this is a data science project, and like all data science projects, it needs time, data, and money to run the computers.
The Pace of AI Development
Almost 10 years ago alphaGo beat Lee Sodol 4-1 out of five matches – this was exciting because beating humans in go was considered a significant benchmark for computers due to the complexity of the game. Six years ago, GPT1 and BERT came out. Now DeepSeek has come out and some people speculate that the really cutting-edge LLMs are hidden in openAI’s basement so they can keep it for themselves (or it’s too expensive to run for profit). The rate of change of these models is exciting (and maybe a bit existentially terrifying). The possibilities are mind-boggling. Maybe hard AI, AGI, ASI or whatever you want to call it is around the corner (although with rumors that OpenAI are basing that milestone on profit generated by the model, I’m not holding my breath (source).
That said, the current reality is that, like any tool, LLMs still make mistakes. If you want to use an LLM in a system where you are removing skilled, attentive human oversight it better not really matter, or you would want to know how good your solution is and what your acceptable performance is.
Measuring Success: “How Good Does It Need to Be?”
When I’m working with one of my clients and they ask me to train a model or to come up with a new solution I like to ask one question, “how good does it need to be?” This is a question I love debating – dm me and we can argue about it. Good enough is good enough! The corollary is that not good enough is not good enough! Of course it is/is not respectively! I’m not an advocate for doing a bad job, but perfect does not exist. Even perfectionists will have a threshold that they are working for, they just might be slightly nebulous (and maybe a bit scary if you can get them to be honest with you (when I can’t imagine my dad’s disappointed face when he looks at this piece of work maybe, right dad?)).
That tangent aside, if you want a tool, you need to know if it is good enough to do the job you need it to do. A tape measure is the perfect tool unless you need vernier calipers or an interferometer. That rant aside, you need to define your requirements for your new tool and measure it to check that it works.
The Key to Implementation: Data and Strategy
How do you measure the performance of your solution? Shockingly, you need data for that. A fair bit. And unless you are going to do banal applications of LLMs you will probably need data to train your model. The difference between basic regression models and LLMs is that they have a massive head start (in addition to the mountains of carbon dioxide and stock-market value generated as a by-product of training them). They are very, very clever regression models (maybe I am, too – dm me and we’ll get into it) trained very well, but they are not trained for what you want them to do. You need to do that with prompt-engineering and your RAG database.
If you don’t give them lots of context, they won’t be able to do it. If you stuff up your training/testing/validation datasets you are going to have a model that doesn’t generalize the way you think it does, and your performance will be wrong. It doesn’t matter if it’s the latest GPT or DeepSeek or a linear regression model. You need a data scientist (hopefully from ProCogia) to get that right.
Project Planning Framework
Let’s talk about getting it right, then. What does that look like? First you need to follow these 10 steps:
1. Define the Purpose
What do you want the tool to do? Be specific about the input and the desired output.
2. Establish Performance Metrics
Are you measuring accuracy, precision, recall, or something else? Define how you’ll know if it’s working. (ProCogia can help identify what matters most for your use case.)
3. Set Performance Thresholds
How good could it be in an ideal world—and how good does it need to be to replace or improve your current system?
4. Benchmark Against Current Solutions
What’s your baseline? How will you know when your new solution is better than what you’re already using?
5. Inventory Your Data Assets
What historical or labeled data do you already have? Is it relevant or usable?
6. Clarify Timelines
When do you need this to be deployed or deliver value? Planning around realistic timeframes is critical.
7. Define Your Budget (Even if It’s Fuzzy)
You don’t have to commit to a number upfront, but understanding rough limits around cost, compute, and staff time will help you scope correctly.
8. Plan the Work
Map out phases of the project, whether using Agile, waterfall, or something in between. Assign responsibilities and checkpoints.
9. Execute the Work
This is the build phase—where data scientists (ideally from ProCogia!) help bring the model to life.
10. Iterate and Improve
The best AI solutions go out of date. Evaluate results, make improvements, and keep looping back through the checklist as needed.
ProCogia can help you through these steps if you had a vague idea in the bath yesterday, if you have a proof-of-concept prototype or if you would like our help iterating on your model to achieve state-of-the-art performance.
Final Thoughts
What problems are tractable and how fast they can be implemented may change as the tools (read LLMs) get better and better, but if you want to trust your solution does what you want it to do you need to treat the problem seriously. I haven’t been into OpenAI’s or NVIDIA’s offices, but I bet they all do the same thing, and I bet they know perfectly well that these steps are required.
Summarizing this rambling diatribe: LLMs are exciting, and I am lucky to get to work with them. They are powerful and have changed the world and will continue to do so. Even if we get a hard AI like R. Daneel Olivaw or the Technocore they will still have to follow good engineering principles to make well engineered solutions.