AI Foundations for Non‑Tech Professionals
Key Points
- The talk is aimed at non‑technical professionals who work with AI daily (e.g., marketing, sales, product, leadership) and will cover the basics of how AI works and its broader implications.
- Core technical foundations are explained in plain language, focusing on neural networks (pattern‑recognizing artificial neurons, back‑propagation) and tokenization (breaking text into manageable “building‑block” units).
- These fundamentals are then linked to how machines learn—covering concepts like fine‑tuning, context windows, and the emergence of behaviors such as hallucinations when AI operates in real‑world settings.
- The presentation concludes with a look at AI’s social impact in 2024, exploring how large language models are reshaping careers, companies, and society at large.
Full Transcript
# AI Foundations for Non‑Tech Professionals **Source:** [https://www.youtube.com/watch?v=u7GayC4XTJ0](https://www.youtube.com/watch?v=u7GayC4XTJ0) **Duration:** 00:32:04 ## Summary - The talk is aimed at non‑technical professionals who work with AI daily (e.g., marketing, sales, product, leadership) and will cover the basics of how AI works and its broader implications. - Core technical foundations are explained in plain language, focusing on neural networks (pattern‑recognizing artificial neurons, back‑propagation) and tokenization (breaking text into manageable “building‑block” units). - These fundamentals are then linked to how machines learn—covering concepts like fine‑tuning, context windows, and the emergence of behaviors such as hallucinations when AI operates in real‑world settings. - The presentation concludes with a look at AI’s social impact in 2024, exploring how large language models are reshaping careers, companies, and society at large. ## Sections - [00:00:00](https://www.youtube.com/watch?v=u7GayC4XTJ0&t=0s) **Foundational AI Concepts for Everyone** - The presenter offers a non‑technical overview of core AI technologies, machine‑learning mechanisms, emergent phenomena such as hallucinations, and the 2024 social impact of large language models across roles like tech, marketing, sales, product, and leadership. ## Full Transcript
hello today we're going to talk about
foundational Concepts in artificial
intelligence and I'm aiming this
presentation at people who perhaps
haven't had technical Concepts in
artificial intelligence explain to them
but they work with AI every day so if
that's you if you work in Tech if you're
in marketing if you're in customer
success sales product management this
presentation is for you even if you're
in leadership and you want to understand
what is AI so let's dive right into it
here's what we're going to cover today
number one we're going to talk about the
foundational Tech that enables AI so
things like neural Nets Transformers
attention ETC number two we're going to
use that knowledge of foundational Tech
to understand how machines learn so
fine-tuning context Windows all of that
number three we're going to talk about
the emergent behaviors that happen once
these machines start to actually operate
in the real world these are things that
get us excited and or frightened so
Hallucination is one example that we'll
cover and finally we'll talk about the
social impact of AI what can we see here
in 2024 that shapes the way we
understand how llms could influence our
careers how they're influencing our
companies how they're influencing The
World At Large all right let's get into
it foundational Tech what is inside that
magic box and no it's not a ghost it
just sometimes feels like
it all right we're going to we're going
to start with neural Nets neural Nets
are fundamentally machines that we've
built that develop pattern recognition
by mimicking neurons they're artificial
neurons that learn from
experience so like neurons they use
networks to spot patterns learning
happens when they tweak connections
which is what happens in our own brains
and that simple mathematics that tweaks
connections actually yields emergent
understanding if you want to get
technical about it multi-layered
networks of artificial neurons process
input data and they adjust connection
strengths through something called back
propagation to minimize error now back
propagation isn't is that simple
mathematics we talked about is where you
you just have very simple adjustments to
the connection strength and that
minimizes error and it improves pattern
recognition over time as neuron Nets are
trained on large amounts of data now
this foundational concept underlies
everything else that we're going to talk
about especially trans for our
architecture another key concept I want
to talk about is
tokenization tokenization is this idea
of a vocabulary of small units that
helps the machine read relationships at
a larger range of scales so tokenization
actually chops text into little legol
likee building blocks it helps machines
recognize patterns and Tents and other
parts of the word suffixes prefixes and
it balances in between letters and whole
words if you want the technical
explanation algorithms segment text into
discrete units or tokens like words or
subwords or even characters which are
then converted into numerical
representations so that the model can
process them tokenization is a
prerequisite for everything in text
based Ai and it directly impacts things
like context window size which we'll
talk about later in this
presentation so Transformers we talked
about this earlier when we mentioned
neural Nets fundamentally Transformers
enable machines to consider an entire
block of text at once that means the
machine reads the entire sentence it
doesn't just discover the text word by
word that helps it figure out
relationships between words and juggle
those relationships as far as I know jet
GP jet GPD is not autonomously Contender
although I hear there's a lot of AI
enabled dating right now the technical
description here is that Transformers
utilize self attention mechanisms to
weigh the importance of different parts
of input simultaneously as they pull the
whole text in and that enables them to
process in parallel across that
relationship set and to capture very
long range dependencies and so part of
why this text processing is important is
if you have a good siiz context window
which will get to you can then
understand how parts of the text that
are somewhat separated actually relate
together humans can read a text and say
oh well this is how the introduction led
to the conclusion that kind of thing is
something that Transformer architecture
enables machines to do as well this is
key to the performance that modern
language models have been able to unlock
and relies heavily on this idea of self
attention which is what we want to get
to next here an llm can
dynamically focus on different parts of
the conversation independently that
means it can listen for connections it
remembers context versus current words
and it can shift Focus as your
conversation with the llm
evolves so technically speaking that
means it's Computing attention scores
between all pairs of input elements
creating a weighted sum that emphasizes
the relevant connections and the textual
information now part of how it does that
on the Fly is based on previous training
where it's done this it's looked at test
texts before it's looked at a lot of
training Data before and it's computed a
lot of these scores in the past so that
it understands this is how this kind of
text typically works these are the
relationships that typically apply and
that allows it to you know behave as if
it's seen resumés before when you input
your resume
all right this mechanism is the crucial
underlying factor for Transformer
architecture in fact there's a very
famous paper in AI called attention is
everything and it's all about why this
unlocks so many capabilities in
llms it contributes to the model's
ability to understand and the entire
piece of context that the chat is giving
it we'll get into chats and context
windows in a little bit all
right so those are some foundational
Concepts when we understand those how do
those help us understand how models
learn and how they adapt for different
applications general knowledge is too
General most of the time is one of the
things I want you to take away here
because we have these huge models that
are very general but a lot of the work
being done in AI is basically taking
those foundational models that are
trained on so much data most of what
Humanity has created created today and
and focus on taking them to a particular
application and tuning them we'll
describe how that works so the first
concept to understand when you're
talking about this is transfer learning
so if you want to talk about how
machines learning fund how machines
learn fundamentally pre-training on lots
of General data enables rapid learning
of specialized
subjects so what that means is that AI
is using Knowledge from one task for
other tasks because it has that
foundational knowledge
set it initially needs a massive amount
of data and when I say everything that
Humanity has written to date I really do
mean that it's everything they can find
on the internet it's digitization of
every book they can get a hold of it's a
massive amount of data this enables it
to understand fundamentally how human
language
works there is a lot of suspicion that
llms have figured out intuitively a
grammatical structure that linguists
can't yet formally describe which is a
nice little tidbit for a cocktail party
sometime uh when it's learning it
preserves learned patterns as it looks
at new data and those patterns are
driving really quick learning when it
grasps new subjects so it looks at
things in the light of what it's learned
before if you want to get technical
about it pre-trained models on large
data sets are fine-tuned on smaller and
task specific data
sets and that allows the transfer of
learned features and knowledge across
those domains
efficiently this is closely related to
fine-tuning which we're going to get
into all right
fine-tuning is when you are adjusting
parameters or weights from your general
learning data set to more effectively
fit a task specific data set so you
start with base parameters from general
knowledge you look at examples of a
specific task and you optimize those
weights I think of this as tweaking a
recipe so technically speaking a model
adjusts pre- trained model parameters
using task specific data often with a
lower learning r rate so it's not
perhaps quite as efficient because it's
a smaller amount of data but it adapts
the model's knowledge to a particular
domain so you get that efficiency back
because there's all of this background
knowledge which is what we talked about
earlier fine-tuning is one practical
application of this idea of transfer
learning and can help mitigate
hallucination and we'll get into talking
about
hallucination so again from from a super
non-technical perspective this is taking
your French omelet recipe and it's
adjusting it just a little bit if you
want to bring in the summer herbs from
the
garden that's the idea of fine tuning
you're fine tuning especially if you're
fine tuning for an audience like you're
bringing someone over for brunch well
you want to optimize your
omlets okay next concept is context
windows so fundamentally llms can only
ingest a certain number of tokens per
turn there is a hard limit and by per
turn I mean per conversational utterance
that we give them and this is the only
text they directly reference every other
piece of text they've read is encoded as
numbers somewhere uh it's not directly
readable as text current limits are
roughly the size of a big novel
technically speaking it's about a
128,000 tokens you might call it um a
little bit over a 100,000 words
300 Page book something like that
there's no direct text reference outside
this window and older inputs in the text
window Decay as new inputs
arrive now there is a hack for this and
I note that llms tend to hack this by
reinges chats and so part of why Claude
will tell you that you can reach a limit
in a chat with Claude is because
fundamentally on the back end Claude is
is taking the entire chat that you have
had about that subject every time you
talk and reinus it as a single piece of
text and so it's sort of like making the
llm reread the entire conversation plus
your new chat every time you
talk and that helps llms to simulate the
kind of
responsiveness that and attention that
humans show without the ability
to have to remember every word of the
conversation first and so when humans
are having a conversation we don't
really think about this I don't exactly
know what our brains are
doing but when we talk all we do is
listen to what's being said and we come
up with a response and we come up with a
response that's contextual right that
matters to the
conversation when llms are responding to
get to that kind of human fluency they
have to reread the ire chat in the
background and if you didn't know that
that's okay you never see that happen
that's something that's submitted on the
back
end so technically speaking a context
window defines the maximum number of
tokens usually in the thousands right
there's there's really none of them now
that are not in the thousands they can
be processed together serving as the
model's working memory for generating
contextually relevant
outputs so context window size is
affected by tokenization
and as I've discussed it impacts the
model's ability to maintain coherence in
longer responses and that also matters
because if you're trying to get it to
sort of do these lengthy responses if
you're trying to get it to write your
document for you it's reinus and
rereading the document and that's why
you sometimes get this idea of like
alphabet soup when you get longer
documents out of AI and that's why I
tend to use AI more often for shorter
tasks for bullets for things that I can
expand into later because it doesn't do
as well with those longer pieces of
text all right let's get into prompt
engineering prompt engineering is all
about guiding an llm toward a desired
response by carefully forming the input
text you're framing the utterance just
to your pattern recall for the llm
you're providing examples to trigger
specifics and by the way those can be
positive or negative both are helpful
and you're laying out very clear output
expectations
so technically speaking prompt
engineering is designing your inputs
with very specific structures or
examples or instructions that guide the
lm's generation process for new text the
next token prediction that the llm
relies on and improves response
relevance and accuracy fundamentally
you're saying I can prompt you with a
set of tokens that are carefully
structured such that you are triggered
into a pattern recognition with
particular relationships you've seen
before between tokens in your general
knowledge and then you're able to come
back and recognize that pattern and
predict a set of tokens which is exactly
what they do when they write out that
text on the
screen that will be more relevant right
that will be more useful to me and so
it's all about being able to be inside
the llm system and part of good prompt
engineering is having a good mental
working model of how llms work and
that's why I'm doing this lesson because
I think sometimes we think of them as
black boxes and that doesn't lead to
helpful
prompts Okay so we've talked about
foundational patterns we've talked about
learning let's talk about emergent
behaviors what are the unplanned
behavioral patterns that are
characterizing llms today in 2024 number
one is emergent problem solving models
trained on huge amounts of text are
showing surprising capabilities like
mathematics
understanding now there are limits to
things like mathematics understanding in
a pure
llm
architecture but there are not really
limits that we found if we adjust an llm
and put it into a larger tool chain
approach which I'll get to and sort of
describe and I want to call that out
because sometimes we see emergent
problem solving that is useful for
everyday tasks like I can now use a
large language model in 2024 and if I
say figure out the tip across six people
it generally does it
fine but if I'm solving International
math Olympiad problems it will not do it
fine because the llm is just not able as
a text model to do that right now that
being said the example I gave is still
relevant because Google just won the
silver medal in the international math
Olympiad by using an llm and a tool
chain approach approach with some other
tools on the back end to solve very very
difficult mathematics problems so this
kind of llm emergent behavior is built
on by humans so that we get more and
more of what we're looking for whether
it's within an llm natively or a larger
architecture so how do llms do these
emergent behaviors like we don't
necessarily design them to be good at
math when we teach them everything we've
written down with human language that
just happened well fundamentally absorb
patterns that llms are good at
understanding have wider applications
we've built this massive pattern
recognition machine and we've given it a
huge amount of data to look
at now when you give it prompts on top
of that data set it can drive a novel
recombination of patterns and so we are
a part of this emergent problem solving
experience too and we may not recognize
that but our unique prompts are
helpful and llms are inherent cently
because of everything we've discussed
previously very skilled at applying the
patterns they know to new
data so fundamentally what's happening
is that llms when they show behaviors
like mathematical reasoning are not
actually doing mathematical reasoning
they're still doing next token
prediction so they still have the same
complex interactions that drive uh
relationships between models and
extensive training data generating
weights that they can go measure and use
all of that foundational stuff we just
talked about for text is exactly what
they're using when you give them the
what tip should I calculate problem all
that's happening is that they weren't
explicitly programmed to do any kind of
math at all but they figured out a
pattern in their data set that they
could
use and that was so successful it
surpassed the initial expectations of
designers designers didn't really design
llms for math they just found out that
they could do it
when they actually built the system
because there's a pattern in mathematics
that is somewhere in the data set that
is good enough that the llm can use next
token prediction to do
math somewhat reliably right it's again
a pure llm not doing the international
math Olympiad but yeah calculating the
tip for dinner they can do that turns
out maybe Humanities have written about
that uh in fact this was reminding me of
the Douglas Adams joke about the
complexity of calculating tips at a tiny
beastro and among six friends nice
little aside there read Douglas Adams
he's
great so what other emergent
capabilities do we have the other one I
want to talk about is sort of on the
other side we might consider emergent
problem solving as good we might
consider hallucinations as bad I use the
same image for both to remind us that
these are Flips of the coin we are
assigning value here these are just
inherent capabilities of the model and
we need to understand how to use them I
think it's honestly more accurate to say
that llms are very very good at
plausible sounding data because we built
them to be and sometimes we interpret
that as
hallucinations but really all they're
doing is using their next token
architecture they don't think in facts
unless we put a tool chain in place
behind them to help them so pure llms
are only predicting the next
word they optimize for language flow
which is why they're such good
conversationalists which is why we been
so uh surprised to see that they've been
used as AI companions first and not in a
lot of other ways that we would expect I
think looking back maybe we shouldn't
have been so surprised about that
because language flow makes them sound
human and again as I was saying
hallucinations can be mitigated using a
toolchain approach that gets into the
fact checking that matters for that
particular
task now we are the
ones that determine what is a
hallucination so the famous example uh
from I think it was Air Canada where the
bot
hallucinated a policy that wasn't
there we assigned the value of
hallucination to the bot's utterance
right the bot just generated a response
that it thought was plausible which is
which it had done every other time and
had a chat with a human in that system
but in that particular case the
plausible sounding response was a policy
that had financial implications for the
airline and we assigned that the value
of hallucination and the airline had
massive news headlines around the world
and had to go and fix it and everything
else the point is we humans need to
think about where we apply the models
that we've designed for next token
prediction because part of this is just
using them the way they should be used
and part of it is building tool chains
that help them to check facts what we
should not be doing is expecting l s to
magically know facts the way we do
because as much as we talk about neural
Nets they're not actually the same as
our brains our brains do have symbolic
logic and reasoning they do understand
what facts are and llms just weren't
built for that and so we shouldn't
expect them
to okay I want to close by reflecting on
how llms are shaping
us this is the first generally
applicable technology in decades so what
are some reliable indicators of how llms
are starting to shape
Society first massive personal
productivity enhancement llms today are
extraordinarily effective accelerating
knowledge workflows I'm going to take a
drink of water
here extraordinarily
effective there was a Danish survey that
came out just a few weeks ago that
talked about something like a 37%
reported Improvement in personal
productivity for knowledge workers in
Denmark based on
llms and it turns out that that's
because Tech worker task knowledge
worker tasks are pattern-driven llms can
tackle many many of these tasks even
without tuning because they're in the
original data set which is why Tech
workers all over the world have a chat
GPT window open or another language
model of their
choice and I would say that that
individual productivity is leading to
based on my anecdotal kind of read of
the situation
team and or productivity one of the
examples here in Tech right now is chat
PRD which started out as an application
for product managers and individual
productivity has recently added on team
plans team uh capabilities and that is
just the tip of the iceberg like what
we're seeing is that llms are following
the path of least resistance where we
start with individual productivity
because it's designed as a call and
response it's designed as an utterance
and a responsive next token prediction
and you have to think about larger
architectures of software to build
effective team productivity off of that
basic flow but it's absolutely
possible uh so watch that space right
you should expect AUM more team in org
productivity enhancements in
2025 this is also changing our skill set
patterns so Tech worker skill sets
knowledge worker skill sets have to
shift and we need to think about
allocating intelligence versus just
applying our own intelligence what is
appropriate for an llm as an
intelligence versus our brain as an
intelligence when do we use which is
becoming a hot skill in Tech markets
today so you have to know when to use an
llm and I would argue you also have to
understand that your value is a human
brain has shifted from starting with
drafting in a good draft to editing and
polishing because llms are so good at
drafting you no longer have to face a
blank page problem and that is a 10,000
year old at least problem that is gone
we just don't have it anymore and I
think that's really cool and
fundamentally a lot of the sort of
rubric for allocating intelligence is
having an internal quality dial and
thinking about what are llms good at
what are brains good at where is the
quality meter for this task and what do
I need to get to that quality meter most
efficiently so I would expect that AI
interaction AI oversight AI creative
problem solving are things that we will
continue to need to get better at I
think building and sort of getting into
code is something that we are going to
do more and more because llms have been
the greatest code unlock that I think
I've ever seen they've simplified
people's access to coding as a language
because they look at coding as next
token prediction too that's another
emerging capability that they weren't
necessarily designed for
always check your code coming out of an
llm by the way not always bug free just
like a
human all right another Factor here is
pressure to
monetize and that little guy there he's
back he's thinking about the pressure to
generate return on investment for Wall
Street corporations are on track to
spend roughly a trillion dollars on llms
all in over the next couple of years and
I'm factoring in both the costs that you
have at the end of the stack where
corporations are applying Ai and the
estimate also includes the huge
foundational model Investments and
training Investments for building models
that are are coming up if you're
spending that much money there's
pressure to generate a return on that
investment we have not seen anything
like that return there was a paper that
came out by Sequoia just a month or so
ago in June of 2024 that talks about
this $600 billion shortfall where we
have a gap between what we've spent on
LMS and what we would expect to get as a
return and it's not been
closed and where we have applied llms
our Focus has been on efficiency not
growth and I think that is an issue that
is associated with where we were at in
the economic cycle when llms first broke
we were not at a high point in the cycle
and leaders were looking for things that
helped them drive cost efficiencies and
llms are something that you can get cost
efficiency out of if if you put them in
the right place in the org potentially
and so that's where leaders of pigeon
hold them but Business Leaders need to
understand that creative applications
and growth drivers are probably the
bigger long-term value window here for
llms and we need to think more broadly
about what llms can do in order to
realize a massive return on investment
that would justify the cost of creating
them in the first place right now most
monetization is proxied via the cloud so
if you're using cloud or compute to
drive llm fine-tuning if you're using it
to drive direct usage of large models
like open AI that
charge fundamentally that's turned into
like Cloud Revenue at the Enterprise
level so Enterprises will get a
fine-tuned model that derives from chat
GPT or that derives from some other
model they'll put it on a cloud
somewhere maybe it's on Azure maybe it's
on Google Cloud maybe it's on
AWS and then then the cash register
Rings the way it usually does for cloud
spend every time they use it right and
that's how some of that monetization
happens so I want you to just keep that
one in mind because that one is going to
continue to be a factor in the
background of executive decisions around
AI until we start to see substantial
money back that for example is why Andy
jossi decided to start charging for
Alexa that's something that's apparently
coming very soon and it's because he can
no longer tolerate the cost on the
balance sheet without direct return on
the AI investment for
Alexa all right new security risks LMS
are breaking our mental models of
deterministic security So Bad actors may
never have been as simple as this idea
of someone in a gray hat sitting at a
computer far away typing in and sort of
sending green lines of code charging
across the screen the way we had in the
movies but security risks were
deterministic they were driven by code
and the intent of a particular
hacker these days they're more emergent
we can have things like hallucinations
as drivers of liability we talked about
this with the Air Canada
example we can have voice mimicry that's
autonomous is a new spear fishing Vector
people are using this in the wild
already and it's concerning the ability
to mimic a voice becomes the ability to
mimic identity socially if you apply it
in the right context and that generates
new attack vectors and new
vulnerabilities for all of us finally
the scale at which you can create
information and lm's propensity to focus
on the flow of conversation rather than
the facts means that it's never been
easier to create disinformation and that
risk which has already been probably
100x just from llm text outputs is 100x
again when you throw in the potential to
generate video and images because we are
moving to a much more visual Society Tik
Tok is exploding Instagram exploded back
in the 2010s and people are consuming
stuff via video you're watching this via
video I promise I'm not an AI
Avatar when you look at that General
pattern it's easy to get afraid wow we
have new vulnerabilities we have new
attack vectors there are new security
risks that's all
true
but you have to look at a new technology
in terms of the overall impact to
society and one of the things that I
think is really compelling is that llms
like most general purpose Technologies
are being used both for incredibly
positive applications they're being used
in drug development and also for
applications that are negative for
society like we're discussing here on
this
slide and that's what happens when a new
general purpose technology comes out and
it's up to us as individuals as
communities as companies to make use of
llms in ways that benefit us as a whole
and
that is a nice turning point we are
likely living at this moment in history
that we will look back on as a huge
inflection point for our species and I'm
not talking about that with the
assumption that we're going to get
artificial general intelligence or we're
all going to be like under skynet's rule
shortly I'm actually saying just what we
have today is showing us that large
language model models May well go down
as Humanity's greatest invention when
all is said and done people may look
back on this time and say wow I wished I
lived during that time when everything
was new and everything was on the
horizon everything was something that we
could shape that is the opportunity that
is the challenge of large language
models here in 2024 so there you go I
hope you understand large language
models a little bit better I hope you
have a sense of how you can use them and
uh I'll see you in the next lesson