1) Email and shared inboxes
Email is the enterprise’s largest unofficial database. Approvals, exceptions, vendor negotiations, and customer promises live in threads. Attachments multiply the problem: PDFs, spreadsheets, and photos that never become rows. The fix is not “ban email.” The fix is classification plus extraction plus retention policy tied to legal holds.
2) CRM and ticketing notes
Your CRM has structured fields for stage and amount, but the truth is often in rep notes. Why a deal stalled, which competitor appeared, what legal pushed back on. Ticketing systems repeat the pattern for support and implementation. Notes fields are unstructured by design, and they rot unless you structure selectively for search, analytics, and agent tools.
3) Contract and procurement folders
Even when a contract management tool exists, reality includes legacy folders, final_final PDFs, and side letters stored in drives without metadata. Your team knows the filenames better than the database does. Surfacing this content matters for renewals, obligations, and pricing escalators, especially when leadership asks a question that requires citations, not vibes.
4) Call centers and operational recordings
Calls and chats contain consent-sensitive data and high-variance phrasing. They also contain the ground truth for why customers churn and what agents do under pressure. Modern stacks transcribe, redact, and summarise with strict access control. The failure mode is storing transcripts without retrieval discipline, which just creates another haystack.
5) Photos and videos from the field
Technicians photograph nameplates, damage, and installed configurations. Inspectors capture evidence chains. This media rarely lands in a warehouse row even though it determines warranty outcomes and safety. Computer vision plus metadata extraction turns those assets into structured events tied to assets and work orders.
Opinion: start from decisions, not from “data lakes”
We see the cleanest programs when leadership names five operational decisions per quarter that suffer from missing facts. Structure the smallest slice of data that improves those decisions, measure lift, then expand. The anti-pattern is a three-year enterprise data initiative that produces governance without throughput.
What to do next
Pick one high-volume unstructured source tied to a metric you already track (time-to-resolve, denial rate, days sales outstanding) and run a Rapid POC that proves extraction and linking quality on your samples. If the numbers hold, you have a business case for production hardening and broader coverage.
If you want help choosing the first slice, bring your top ten operational questions to a scoping call. We will tell you which ones are structurable quickly and which ones are research projects, and we will be blunt about the difference.