What "don't train on my data" actually means
When a consumer AI product says it "doesn't train on your data," what they typically mean is that your inputs are not used to update the underlying model's weights. This is a real and meaningful protection — your sentence about a specific client matter will not influence what the model says to a different user tomorrow.
What it does not mean, by default, is any of the following: that your inputs aren't logged, that they aren't reviewable by the vendor's staff for safety/abuse purposes, that they aren't retained for some period, that they aren't used to evaluate model performance, or that they live in a particular geographic region. Each of those is a separate question, with a separate answer, often spelled out only in the vendor's enterprise terms — not the consumer-facing privacy policy.
The four-bucket framework
When you're considering submitting any client information to an AI tool, run the inputs through this:
- Public information that wouldn't matter if the world saw it. Cite-checking a brief against published cases. Drafting boilerplate on no-named-parties terms. Generally fine — though see the broader competence questions in Rule 1.1.
- Client information that the client has consented to having processed by AI. Permitted, with attention to the specific scope of consent. Document it.
- Client information that's confidential under Rule 1.6 but for which informed consent hasn't been given. Permitted only with a vendor whose terms genuinely protect the data — and in many states, only with disclosure to the client.
- Information that's privileged, particularly sensitive, or for which a third-party disclosure could waive privilege. Treat with the highest care. Use only enterprise tools with strong terms, document your due diligence, and consider whether the use case justifies the risk.
Vendor questions to ask before you submit anything sensitive
Before paying for or relying on any AI tool that handles client information, get answers in writing to all of the following. If a vendor can't or won't answer, that itself is the answer:
- Will my inputs or outputs be used to train your model? Now and in the future?
- How long are inputs and outputs retained? Where can I see this in the contract?
- Who at your company can see my data, under what circumstances, and is there logging?
- Where geographically is the data stored? Where is it processed?
- Do you offer a contractual commitment to confidentiality, indemnification, or both?
- What happens to my data if my account is closed or you're acquired?
- What is your security certification posture (SOC 2 Type II, ISO 27001, etc.)?
- Is there a sub-processor list? Does my data ever go to other AI providers downstream?
- What is your incident-disclosure timeline if there's a breach?
Consumer plans vs enterprise plans
There is a real and consequential difference between the consumer version of an AI product (typically a $20/month seat with a checkbox-level privacy policy) and the enterprise/business version (typically a per-seat or per-organization plan with a real contract, training opt-out, retention guarantees, and admin controls). For client work, the consumer version is generally not appropriate. The enterprise version, with terms reviewed, often is.
ABA Opinion 512 doesn't quite say this in those words, but it gets close: "reasonable efforts" to prevent unauthorized disclosure scales with the sensitivity of the information. The default consumer plan of a major AI provider does not represent reasonable efforts when you're processing privileged material; the same provider's enterprise plan, with the right contract, often does.
When to disclose AI use to the client
Disclosure expectations vary. The trend across recent state bar opinions is toward disclosure when AI use is material — when it affects cost, timeline, or the nature of the work product. Florida Opinion 24-1 leans toward more disclosure rather than less. California's Practical Guidance is comparatively flexible. ABA Opinion 512 stops short of a blanket rule but emphasizes Rule 1.4's communication duty in cases where the use is significant.
The clean default: address AI use in your engagement letter going forward. Treat it like any other operational decision the client has a right to know about. A boilerplate paragraph saying "the firm uses AI tools for [specific purposes] under [specific protections]; you may opt out by writing to us" gets you most of the way home for most matters.
Where BuildLegal sits in this
Tools you build here run in isolated sandboxes. Your project's data does not train any underlying model. We've published our own data practices alongside the product. That said: when the tool you build itself calls out to an AI provider (for example, to summarize a document), the data flows to that provider under their terms. Pick the integration vendor with the same scrutiny you'd use to pick any AI vendor for sensitive work.