The 3-Layer Generative AI Cake

How the layers of AI help you build better AI apps

When I was growing up, I would get ice cream cakes for my birthday. Yes, I know it’s not really cake, more like a big block of ice cream molded into a rectangle with icing and candles on top. But I loved ice cream way more than cake.

It was much later in life when I appreciated the joys of real cake, the type with spongy layers and icing all over. I really loved the steakhouse version of cake, served in slices bigger than my head. No over-the-top dinner was ever complete without ordering a huge mountain of sugar-laden dessert.

Yes, I am a glutton who like massive slices of cake!

It was also the point in my life that I got curious about the science of baking and constructing cakes. Some of the steakhouse cakes had an incredible number of layers. Then there are mille crepe cakes with cream in-between each layer. While not a thousand layers as the name implies, it’s still impressive.

The best cake though is the quintessential three-layer cake. It is high enough to look impressive when displayed, well-balanced between layers of cake and icing, and just the right portion size when sliced. There really is no better architecture for a cake*.

There is something wholly satisfying about having three of a thing. The number three comes up often in spiritual texts representing divinity. Ideas are better communicated in groups of three at a time. Thirds are a common structure in music. And in technology, there is a completeness when discussing systems such as three-tier architectures or the three components of technology (hardware, software, data).

I get a lot of questions right now from startups about AI, especially Generative AI. The term has gone from barely an idea to dominating the industry in two years. For founders not using AI or machine learning before ChatGPT, it can seem either an overwhelming new domain or an overhyped fad.

The challenge is that Generative AI feels opaque. We know that it is driven by these things called large language models (LLM’s), and the main user interface is a chat window that you use to prompt the model. You can also access LLM’s through an API to send requests programmatically of varying sizes up to up a maximum set by the context window and priced in this currency called tokens.

Then there is the explosion of applications all advertising that they are built to use Generative AI. The marketers are all over the content creation tools while developers are plugging coding assistants into their IDE’s. In fact, for every role and function in a company from accounting to HR to support, there are already entire market maps with overflowing logos of startups providing tools.

With any new technology though comes a lot of questions, uncertainties, and risks. Generative AI has already faced bad press for poor results, including providing completely wrong answers. For indie hackers, that might be okay. For enterprises and startups tackling problems in regulated spaces like healthcare and finance however, the risks are unacceptable. Founders and developers need a much deeper and more nuanced understanding of the technology behind Generative AI in order to make rationale decisions into the options, costs, and trade-offs.

Which brings us back to our three-layer cake from before. The easiest way to think about Generative AI is to view it as an architecture comprising three layers of a stack:

  • Compute – the engine that powers the creation and training of LLM’s

  • Models – the actual LLM’s (or more broadly foundation models, FM’s)

  • Applications – the act of using LLM’s for completing tasks and work

What does this look like from a practical standpoint though? Let’s dive into each of these layers using the cloud provider I know best as an example, AWS:

How AWS views the three layers of the Generative AI stack

COMPUTE

Generating AI models from scratch requires raw compute horsepower. Other than Bitcoin mining, there is no operation more compute intensive. All this compute also has sustainability impact. Estimates say that servers dedicated to AI will use up to 134 terawatt hours (TWh) of energy per year by 2027, equivalent to the annual power usage of Norway and Sweden.

In order to manage compute and sustainability costs while providing optimal performance, models need compute that is tuned for AI workloads. AWS has invested over the past several years creating chips for this very purpose, to run AI training and inference at scale. These AI accelerators, Trainium and Inferentia, are packaged as specialized EC2 instance types, which significantly bring down the cost of AI workloads while also increasing performance and lowering energy usage over other instance types.

MODELS

This is the bread and butter of Generative AI. Some of the more popular models include Anthropic’s Claude, AI21’s Jurassic, Meta’s Llama, OpenAI’s GPT-4, and Cohere’s Command. There are also multimodal models combining text, images, audio, and video for inputs and outputs such as Stable Diffusion from Stability AI, Midjourney, and DALL-E 2 by OpenAI. Then there are numerous open-source models such as Mistral, Falcom, and other models listed on Hugging Face.

While some use cases may require building a foundation model from scratch, most simply require the use of existing models and training it with specific uses and data. This is where Amazon Bedrock helps by providing access to multiple model providers via the AWS console or API. This allows developers to test and implement the model that best suits their needs, swap them when better models or newer versions become available, and customize them through fine-tuning or Retrieval Augmented Generation (RAG) to provide more detailed and accurate results.

What if you want AI to do several things in a row, like recommend and book travel or process an insurance claims form? Bedrock also enables developers to build Agents that chain actions together and fully automate workflows based on the inputs and outputs of the model.

APPLICATIONS

This is where the outputs of Generative AI come to life. Applications take the standard chatbot interface that launched the current AI hype cycle and molds it into an actionable and tightly integrated experience for users.

One of the most common applications to have launched are code assistants. Amazon Q Developer is one of many coding tools along with GitHub Copilot, GitLab Duo, and Cursor. With Q Developer, AWS lets developers work in their IDE to ask for assistance via a sidebar chat window, to get in-line real-time code suggestions, troubleshoot bugs and errors, transform legacy code to newer versions, and use agents to complete more complex coding tasks.

Beyond code assistants however, AWS also offers applications for business usage. Amazon Quicksight adds Generative AI to enable users to build and customize analytics dashboards using natural language. Amazon Connect provides call center agents real-time recommendations using AI and relevant data to better assist customers. Then there is Amazon Q Business that connects to and pulls data from multiple internal systems to deliver insights to users via chat or through applications built through Q Business no-code tools.

The three layers of the Generative AI cake

There are also two other important components we need to complete our Generative AI layer cake: the icing and a stand to display our geeky cake. In this case, the icing is Security and the stand is your Cloud Infrastructure.

SECURITY

One of the major knocks against Generative AI is that it is non-deterministic, creating unpredictable outputs even when given the same input multiple times. There are also valid questions about the privacy and security of the data used to train models and the security of the data itself.

In the AWS world, security is infused into each layer of our Generative AI cake, ensuring usage of Generative AI does not introduce unmanageable risks:

  • Compute layer – AWS Nitro separates the traditional elements of the hypervisor to create greater flexibility to deliver new instance types like the AI instances mentioned earlier that are also more secure because access to these resources is locked down.

  • Model layer – Bedrock cordons off customer data from the model provider so no data is shared. There is also an added feature called Guardrails to reduce hallucinations, minimize harmful content, and apply custom safety, privacy and truthfulness protections around the models.

  • Application layer – Q Developer incorporates both security scanning & code references to ensure that code suggested is free of vulnerabilities and that any open-source code being used is properly cited.

INFRASTRUCTURE Of course, none of this matters if the tools of Generative AI sit outside of where your applications live (or at least it makes things a whole lot harder). Compute, networking, storage, security, and databases supporting your applications should ideally be on the same platform as where you are running your Generative AI workloads. This is your “cake stand” that makes it easier to manage the entire infrastructure and reduces complications.

Now that the various components of Generative AI have been laid out, how are you going to use these layers to build your Generative AI-enabled startup? That is for a future newsletter, but in the meantime, let me know your thoughts on our 3 layer Generative AI cake and if I missed anything!

Mark

*For those of you saying cheesecake is a one layer cake, cheesecake is not a type of cake.

Everyone is talking about Founder Mode. I cannot recall any startup idea going this viral and this fast as Paul Graham’s latest essay. Here is a brief synopsis of what he wrote and why it matters:

  • Challenging Convention - The standard advice of "hire good people and give them room" hinders growth. Founders are finding success by staying deeply involved.

  • Skip-Level Engagement - Unlike the modular approach of manager mode, founder mode often involves direct interaction across multiple levels of the organization.

  • Customized Scaling - There's no playbook, so successful founders are crafting unique approaches to maintain startup agility at scale.

  • Potential Pitfalls - Beaware of using "founder mode" as an excuse for micromanagement or non-founder execs & managers trying to adopt this style.

  • Uncharted Territory - There is plenty of knowledge and resources for manager mode, but we're just beginning to understand founder mode and how this shapes leadership in startups.

The ey takeaway is that as startups scale, it is crucial for founders to maintain the vision and remain engaged for sustained success. Questions you should ask yourself based on your role in the startup:

  • Founders - How are you navigating the balance between hands-on leadership and delegation as you scale?

  • Employees - How does working in a "founder mode" environment impact your role and the company culture?

  • Managers - How do you adapt your management style to align with founders while still leading your teams?

If you read Paul’s post, do you have experiences to share about Founder Mode versus Manager Mode in action? Given the vastness of the topic, this is definitely something we will revisit in a later newsletter and some of the implications of implementing this concept in startups.

I was skimming through my book Community-in-a-Box recently and I’m happy to report that it still holds up well after four years. But there is also a lot more that I have learned since about building, scaling, and sustaining communities.

Community-in-a-Box has traveled the world!

The world has changed quite a bit as well since the first edition. There are more current community practices, newer tools for community, and Generative AI has impacted how we support and engage communities. I am seeing this play out across startup & developer ecosystems during my travels with AWS.

Therefore, I have decided to publish a second edition of Community-in-a-Box! That means I will be reaching out to community leaders, builders, and managers on their thoughts on what it takes to build and scale healthy communities.

If you think you can contribute, share your thoughts in this survey. All the details are on the survey and the last day for submitting a response is Sept 17. Ping me if you have questions and thanks again for your help!