DataTalk

Project tier down for natural-language querying over company data using constrained intent/slot compilation and an experimental SLM training path.

Concept

Business teams cannot reliably answer data questions without SQL or analyst support.

Product, sales, risk, and operations teams often need fast answers from company data, but they do not know table names, joins, filters, or schema relationships. Generic AI can produce confident but unverified answers, while dashboards only cover fixed questions. DataTalk solves this by turning natural-language questions into schema-aware, SQL-backed, source-row verified answers.

DataTalk natural-language data query pipeline diagram

User Personas

Riya Shah Product Manager
Age32
Experience7 years in B2B SaaS product management and analytics workflows.
BehaviourAsks cross-functional questions across product, sales, support, billing, and risk data.
Pain PointsDoes not know table names or relationships, waits on analysts, and needs fast validation.
Arjun Mehta Risk Team Analyst
Age35
Experience9 years in operational risk, audit reporting, compliance checks, and exception analysis.
BehaviourReviews suspicious accounts, invoices, support tickets, churn signals, and customer risk patterns.
Pain PointsManual checks are slow, joins are complex, and audit work needs traceable source rows.
Neha Kapoor Sales Team Lead
Age29
Experience6 years in enterprise sales, customer growth, and revenue operations.
BehaviourLooks for high-value customers, regional performance, revenue, overdue invoices, and account health.
Pain PointsDepends on RevOps, dashboard filters are limited, and customer signals are hard to connect.

Selected User Persona

Riya Shah, Product Manager

Riya is the strongest starting persona because Product Managers sit between product, sales, support, billing, and leadership. Her workflow naturally tests cross-functional data needs, high follow-up frequency, answer trust, and demo clarity in one user journey.

User Journey Map

Journey Stage Actions Emotion With Emoji Pain Points Opportunities
Frame Business Question Riya starts with a business question in plain language instead of SQL. Curious 🤔 She does not know the exact table names, columns, or relationships. Let the user ask questions without needing schema knowledge.
Check Schema And Data She tries to understand which tables contain product, sales, support, or billing signals. Confused 😕 Table relationships are hard to remember and dashboard filters are limited. Show visual schema, clickable tables, relationships, and sample source rows.
Ask Natural-Language Query She asks a supported business question and expects a reliable answer. Hopeful 🙂 Generic AI may invent columns, joins, or unsupported logic. Use schema-aware intent and slot compilation for supported query families.
Wait For Processing She waits while the system routes the question, compiles SQL, and retrieves data. Impatient ⏳ Waiting without feedback reduces trust in the final answer. Show route, confidence, latency, and processing state before results appear.
Review SQL And Results She checks the answer, generated SQL, and source rows before sharing insights. Careful 🔍 Answers need evidence before they can influence product or leadership decisions. Display SQL, source rows, confidence, route, latency, and answer summary together.
Validate And Follow Up She refines filters, compares rows, and asks follow-up questions across teams. Confident ✅ Manual SQL support slows every iteration and creates dependency on analysts. Keep schema, data, chat, and follow-up workflow in one trusted workspace.

Pain Points

Schema Uncertainty01

Users do not know exact table names, column names, or relationships across product, sales, support, billing, and risk data.

Slow Follow-Up Iteration02

Product and business users need many variations of a question, but every manual SQL request creates delay and analyst dependency.

Limited Dashboard Flexibility03

Dashboards answer fixed questions well, but they break down when users need ad hoc filters, joins, or cross-functional follow-ups.

Low Answer Trust04

Users cannot act on black-box answers unless they can inspect the SQL, route, confidence, latency, and source rows behind the response.

Complex Manual Joins05

Important questions often require joining accounts, invoices, orders, support tickets, and regional data, which is hard for non-technical users.

Weak Workflow Continuity06

Users lose context when schema, data preview, chat, SQL output, and evidence rows are separated across multiple tools.

Pain Point Prioritization

No. Pain Point Time Effort
01 Schema Uncertainty
Time 3
Effort 3
02 Slow Follow-Up Iteration
Time 2
Effort 4
03 Limited Dashboard Flexibility
Time 4
Effort 4
04 Low Answer Trust
Time 1
Effort 1
05 Complex Manual Joins
Time 5
Effort 5
06 Weak Workflow Continuity
Time 3
Effort 2
Time Effort Low High 1 2 3 4 5 6

Solutions

OK Ideas

01. Static data dictionary

Helps understanding but does not answer questions directly.

02. Prebuilt dashboards

Useful for repeated reporting, weak for ad hoc questions.

03. Analyst request form

Organizes work but keeps waiting time and dependency.

Best Ideas

01. Schema-aware chat-to-SQL

Converts natural language into safe SQL using known schema and allowed query families.

02. Visual schema explorer

Shows table relationships, clickable tables, and sample source rows.

03. Verified answer panel

Shows answer summary, compiled SQL, confidence, route, latency, and source rows.

Moonshots

01. DataTalk-SLM

Company-specific small language model for schema-grounded business querying.

02. Governed enterprise copilot

Connects CRM, billing, support, analytics, and warehouse data with permissions.

03. Autonomous insight agent

Detects risks, churn, anomalies, and revenue opportunities proactively.

Moonshot Prioritization

No. Moonshot Time Effort
01 DataTalk-SLM
Time 3
Effort 3
02 Governed Enterprise Copilot
Time 5
Effort 5
03 Autonomous Insight Agent
Time 4
Effort 5
Time Effort Low High 1 2 3

Selected Solution

DataTalk-SLM

DataTalk-SLM is the selected solution because it directly connects user value, product feasibility, and schema-grounded AI. Instead of giving a generic AI chat answer, the system converts natural-language business questions into supported, verifiable data retrieval workflows.

Solution Architecture

DataTalk SLM architecture diagram

Solution Explanation

01. Schema Grounding

The system knows supported tables, columns, relationships, and query families before answering.

02. Intent And Slot Compiler

User questions are converted into constrained intent and slot structures instead of open-ended generation.

03. SQL And Evidence Layer

Every answer can show compiled SQL, source rows, route, confidence, and latency for trust.

04. Visual Data Workspace

Users can inspect schema, preview data, ask questions, and validate results in one workflow.

05. SLM Training Path

The future model can learn company schema and question patterns while keeping production querying controlled.

06. Business Workflow Fit

The product helps PMs and business teams reduce analyst dependency and move faster from question to decision.

Final Product Direction

DataTalk is a schema-aware data copilot that helps business teams move from natural-language questions to verified SQL-backed answers. The product direction is to evolve the current demo into DataTalk-SLM: a company-specific small language model trained for fast, grounded, and trustworthy business querying.