Building an LLM Startup in 2024

There’s been a huge uptick in ‘generative AI’ startups lately, and as is usual during hype cycles, there’s some really high quality startups and some that missed the mark. This article isn’t meant to be critical of any specific startups or approaches to using LLMs. Instead, I thought I would share my thoughts on what I have seen work very well for businesses that I have worked with and recommendations for finding truly high leverage solutions for your business using LLMs.

I’ve written this in the format of FAQs so it is easy to browse through. Feel free to jump around to the questions that seem more interesting to you.

This is not meant to be a technical guide to LLMs. The intended audience is founders, and not necessarily technical founders. If you’re looking for more technical references, you can checkout my page on

LLM Readings or

Getting Started with Large Language Models.

What are LLMs really good at?

I think this is a really critical question to start with. On the surface, LLMs feel incredibly powerful so we often feel naturally inclined to go incredibly ambitious with our applications of LLMs. However, I think if you understand some of the core tasks that LLMs can perform really well, it becomes easier to visualize products that are both very plausible, can scale cost effectively, and are reliable to use.

Examples of tasks that LLMs excel at:

Turning unstructured data into structured data

I think this is by far the most valuable task you can now perform with LLMs that was previously very difficult. There’s a lot of really good implementations of this that I’ve personally witnessed.

Resume parsing: In the recruiting space, there’s been a large number of incredibly complicated resume parsers locked behind paid APIs and hidden into products like ATS systems that are not available for general use. Resume parsing can actually be done really efficiently with LLMs, with a high degree of correctness.

Extracting contact information from emails: this is a problem that I’m specifically targeting with my startup Input. Email inboxes contain large amounts of unstructured data and there’s entire companies targeted at trying to make meaning from our endless streams of emails.

Faster ID verification: OCR (ML that can take photos of text and turn it into text) is a well established technology and has been for a long time. Luckily, LLMs can take what looks like a a complete mess of text (such as what you’d get if you tried to extract text from a driver’s license) and turn it into meaningful structured data (i.e. “tell me the expiration data of this driver’s license”). I’ve seen this technology get used in a few different places where the source state of an ID is not necessarily known beforehand, and thanks to LLMs + OCR, you can rapidly support any type of ID card in any shape.

Summarizing text content

The ability to summarize long form text into short form text, and customize the fidelity of data retained & the tone of voice used is an incredibly valuable task that LLMs are able to perform. A great example of this is blogging platforms and news platforms that are now able to generate short summaries of their content without having a human involved in manual writing and subsequent updating of the summary. This can allow platforms to do new things that they couldn’t do before, such as sending out email digests to users with smaller digestable versions of longer form content.

Semantic search over text context

The world of search has been fundamentally changed by the advent of generalized LLMs. Ever tried to perform search on a platform like gmail and been really annoyed that you can’t remember the exact word that was used in an email? You’re so sure that you remember the general idea of the email, but can’t find the right search keyword so the platform just can’t find what you want. Well that doesn’t need to happen anymore.

Semantic search is when software can execute searches based on the proximity of the meaning of your search query vs. the content being searched. This is the opposite of lexical search, which is when the software is trying to find the closest exact text match of your search query to the content. For example, if I’m searching recipes and the recipe says “spicy chicken” - a lexical search would need the word “spicy” to match, but a semantic search could also match based on “hot” or “picante”.

Transcribing audio to text

Another piece of technology that has dramatically improved as a result of LLM research is speech-to-text. This is when you take a video or audio file and generate text from it. For example, YouTube will do this to generate subtitles for videos that do not yet have subtitles. OpenAI’s model called “Whisper” is shockingly good at this. I’ve even used it on audio that had speech that mixed English and Urdu, and it was able to simultaneously extract the text and translate the Urdu to English.

How do I trust the output of LLMs?

The short answer here is: you shouldn’t. Even if the task you’re performing is incredibly well scoped and all your testing shows that the LLM is doing a great job, it is still important to realize that LLMs are a form of machine learning and are therefore subject to some randomness in how they operate.

If you’re trying to use LLMs, you should try to design your product / UX in a way where human verification exists somewhere in the loop. How you should do this exactly depends on your industry and product.

If there’s a way for you to perform human review of the LLM generated data, whether that’s on your end (i.e. customer support / engineers reviewing outputs before users see them) or it happens on the user’s end, it doesn’t matter, as long as a human looks at it.

One example for this is a client that I was working with that wanted to parse unstructured data from random webpages into structured data and save it into their company’s database. We specifically designed the experience so that when the LLM finished parsing the data, the human that triggered it would be required to review it, edit it, and then submit it to the company’s database.

Similarly, Input sends users an email to approve changes to the CRM rather than making any modifications directly. We do this despite using several state of the art literature-recommended techniques for ensuring really correct high quality updates.

Do I need a machine learning engineer for this?

Even though LLMs are relatively straightforward to work with, they’re still a very novel technology and engineering quality LLM applications is still a rare skill. Most engineers will claim to be able to utilize LLMs since they are “just making API calls” / “just using OpenAI”. I think engineers are generally known to have an unusual level of hubris (speaking from first hand experience), I’d just file this under that. I’d also see engineers that talk like this as large walking red flags.

You definitely don’t need to hire a data scientist or machine learning engineer for LLM applications, but I’d highly recommend going through the usual process of taking a look at the engineer’s portfolio / past work to get a taste for whether they really know what they’re doing.

The realm of possibilities with LLMs is immense, and the realm of what can go wrong is also immense. Hire the wrong team, and you’re likely to end up on the crashing side of the hype cycle.

How do I make sure that my product/company is competitive?

LLMs have changed the world of technology forever, but what they haven’t changed is how the fundamentals of business work. LLMs and AI are not your competitive advantage. Understanding this is the difference between draining your funding vs. making something of value.

If your business is just the “[Company name] but with LLMs” (“Google but with LLMs” or “Uber but with LLMs”, etc), you need to rethink your idea. This was a common red flag in startups even before LLMs. When Uber was getting absurd valuations, every incubator saw a big uptick in “we’re like the Uber for [blah]”. Everyone wants to hop onto the hype cycle.

Fundamentally, your business needs to solve a real problem and deliver value in exchange for money. If the only value you currently have is that some other major player in the space hasn’t realized ChatGPT exists yet, you’re probably missing something. Even the laggards are catching on to this trend now, I would be wary of trying to make money in the gap during which “unnamed big company” hasn’t started using LLMs yet, at least without a really good explanation for why.

Remember, OpenAI has mostly commoditized LLMs. Anyone that can get their hands on $10 can get their hands on state of the art LLMs. And “unnamed big company” has a bigger war chest than you, and when they finally catch onto LLMs, they will come out swinging.