Data Contracts to Prevent Downstream Errors
Key Points
- A new data engineer discovered that downstream users were missing critical data because the problem originated in an upstream system, not his own team.
- The speaker recommends using **data contracts**—formal agreements between data producers and consumers—to improve documentation, data quality, and service‑level agreements.
- Implementing data contracts helps lower AI costs by preventing “garbage‑in‑garbage‑out” scenarios and reducing the need for frequent model retraining.
- The **Open Data Contract Standard**, backed by the Linux Foundation, defines eight sections (demographics, dataset & schema, quality rules, pricing, stakeholders, security, SLA, and custom properties) to structure these agreements.
- Applying such contracts would have given the son clear quality rules and SLAs, preventing the issue and ensuring downstream users received the data they needed.
Full Transcript
# Data Contracts to Prevent Downstream Errors **Source:** [https://www.youtube.com/watch?v=-n3OD-ml_k0](https://www.youtube.com/watch?v=-n3OD-ml_k0) **Duration:** 00:03:01 ## Summary - A new data engineer discovered that downstream users were missing critical data because the problem originated in an upstream system, not his own team. - The speaker recommends using **data contracts**—formal agreements between data producers and consumers—to improve documentation, data quality, and service‑level agreements. - Implementing data contracts helps lower AI costs by preventing “garbage‑in‑garbage‑out” scenarios and reducing the need for frequent model retraining. - The **Open Data Contract Standard**, backed by the Linux Foundation, defines eight sections (demographics, dataset & schema, quality rules, pricing, stakeholders, security, SLA, and custom properties) to structure these agreements. - Applying such contracts would have given the son clear quality rules and SLAs, preventing the issue and ensuring downstream users received the data they needed. ## Sections - [00:00:00](https://www.youtube.com/watch?v=-n3OD-ml_k0&t=0s) **Upstream Data Contracts Resolve Breakdowns** - The speaker explains that using the Open Data Contract standard—an agreement between data producers and consumers—improves documentation, quality, and SLAs, preventing downstream data shortages and reducing AI retraining costs. ## Full Transcript
my son started a new job as a data
engineer the other day he called me in
the middle of the afternoon a little
panicked and he never really
calls his Downstream users were not very
happy the problem they weren't getting
the data they needed for sensitive
reports but little did they know that
the issue was not with his team but it
was coming from the Upstream system does
this sound familiar to you I have heard
similar stories many many many times one
way to solve this issue is to use data
contracts so what is a data contract
it's an agreement between a data
producer and one or
many data
consumers and they
share a data
contract so the data contract why we do
that it's because we
want better
documentation we want better data
quality and we want better slas and why
is the ultimate goal of that is to
really lower the cost of AI you don't
have to retrain your models you get
better data in your system so you don't
have to garbage in garbage
out to do that we follow a standard
called open data contract standard and
it is a standard backed by the Linux
Foundation it is composed of eight
sections demographics which is really
your name U
version uh detailed information about
your data contract then you've got your
data set and schema representing the
what the data is about Associated to
your data quality rules you've got the
the pricing section which currently is
experimental but if you want to share
your data within your organization or
outside your organization you can
specify rules you've got stakeholders
where you see how the contract was being
evolved by the different people you
involved in the creation and maintenance
of the contract you've got rules for
security access service level agreement
and custom property for future
reference and extension
so coming back to my son a this would
have helped his problem by having better
data quality rules and better
slas he would have prevented the issues
and given his customers the data as they
expected and I hopefully next time he
calls me it's just to say hello Papa
thanks for watching before you leave
please remember to hit like And
subscribe