DataTalk

Project tier down for natural-language querying over company data using constrained intent/slot compilation and an experimental SLM training path.

GitHub Repo ↗ Explore Prototype ↗ Back to Portfolio

Concept

Business teams cannot reliably answer data questions without SQL or analyst support.

Product, sales, risk, and operations teams often need fast answers from company data, but they do not know table names, joins, filters, or schema relationships. Generic AI can produce confident but unverified answers, while dashboards only cover fixed questions. DataTalk solves this by turning natural-language questions into schema-aware, SQL-backed, source-row verified answers.

User Personas

Riya Shah Product Manager

Age	32
Experience	7 years in B2B SaaS product management and analytics workflows.
Behaviour	Asks cross-functional questions across product, sales, support, billing, and risk data.
Pain Points	Does not know table names or relationships, waits on analysts, and needs fast validation.

Arjun Mehta Risk Team Analyst

Age	35
Experience	9 years in operational risk, audit reporting, compliance checks, and exception analysis.
Behaviour	Reviews suspicious accounts, invoices, support tickets, churn signals, and customer risk patterns.
Pain Points	Manual checks are slow, joins are complex, and audit work needs traceable source rows.

Neha Kapoor Sales Team Lead

Age	29
Experience	6 years in enterprise sales, customer growth, and revenue operations.
Behaviour	Looks for high-value customers, regional performance, revenue, overdue invoices, and account health.
Pain Points	Depends on RevOps, dashboard filters are limited, and customer signals are hard to connect.

Selected User Persona

Riya Shah, Product Manager

Riya is the strongest starting persona because Product Managers sit between product, sales, support, billing, and leadership. Her workflow naturally tests cross-functional data needs, high follow-up frequency, answer trust, and demo clarity in one user journey.

User Journey Map

Journey Stage	Actions	Emotion With Emoji	Pain Points	Opportunities
Frame Business Question	Riya starts with a business question in plain language instead of SQL.	Curious 🤔	She does not know the exact table names, columns, or relationships.	Let the user ask questions without needing schema knowledge.
Check Schema And Data	She tries to understand which tables contain product, sales, support, or billing signals.	Confused 😕	Table relationships are hard to remember and dashboard filters are limited.	Show visual schema, clickable tables, relationships, and sample source rows.
Ask Natural-Language Query	She asks a supported business question and expects a reliable answer.	Hopeful 🙂	Generic AI may invent columns, joins, or unsupported logic.	Use schema-aware intent and slot compilation for supported query families.
Wait For Processing	She waits while the system routes the question, compiles SQL, and retrieves data.	Impatient ⏳	Waiting without feedback reduces trust in the final answer.	Show route, confidence, latency, and processing state before results appear.
Review SQL And Results	She checks the answer, generated SQL, and source rows before sharing insights.	Careful 🔍	Answers need evidence before they can influence product or leadership decisions.	Display SQL, source rows, confidence, route, latency, and answer summary together.
Validate And Follow Up	She refines filters, compares rows, and asks follow-up questions across teams.	Confident ✅	Manual SQL support slows every iteration and creates dependency on analysts.	Keep schema, data, chat, and follow-up workflow in one trusted workspace.

Pain Points

Schema Uncertainty01

Users do not know exact table names, column names, or relationships across product, sales, support, billing, and risk data.

Slow Follow-Up Iteration02

Product and business users need many variations of a question, but every manual SQL request creates delay and analyst dependency.

Limited Dashboard Flexibility03

Dashboards answer fixed questions well, but they break down when users need ad hoc filters, joins, or cross-functional follow-ups.

Low Answer Trust04

Users cannot act on black-box answers unless they can inspect the SQL, route, confidence, latency, and source rows behind the response.

Complex Manual Joins05

Important questions often require joining accounts, invoices, orders, support tickets, and regional data, which is hard for non-technical users.

Weak Workflow Continuity06

Users lose context when schema, data preview, chat, SQL output, and evidence rows are separated across multiple tools.

Pain Point Prioritization

No.	Pain Point	Time	Effort
01	Schema Uncertainty	Time 3	Effort 3
02	Slow Follow-Up Iteration	Time 2	Effort 4
03	Limited Dashboard Flexibility	Time 4	Effort 4
04	Low Answer Trust	Time 1	Effort 1
05	Complex Manual Joins	Time 5	Effort 5
06	Weak Workflow Continuity	Time 3	Effort 2

Time Effort Low High 1 2 3 4 5 6

Solutions

OK Ideas

01. Static data dictionary

Helps understanding but does not answer questions directly.

02. Prebuilt dashboards

Useful for repeated reporting, weak for ad hoc questions.

03. Analyst request form

Organizes work but keeps waiting time and dependency.

Best Ideas

01. Schema-aware chat-to-SQL

Converts natural language into safe SQL using known schema and allowed query families.

02. Visual schema explorer

Shows table relationships, clickable tables, and sample source rows.

03. Verified answer panel

Shows answer summary, compiled SQL, confidence, route, latency, and source rows.

Moonshots

01. DataTalk-SLM

Company-specific small language model for schema-grounded business querying.

02. Governed enterprise copilot

Connects CRM, billing, support, analytics, and warehouse data with permissions.

03. Autonomous insight agent

Detects risks, churn, anomalies, and revenue opportunities proactively.

Moonshot Prioritization

No.	Moonshot	Time	Effort
01	DataTalk-SLM	Time 3	Effort 3
02	Governed Enterprise Copilot	Time 5	Effort 5
03	Autonomous Insight Agent	Time 4	Effort 5

Time Effort Low High 1 2 3

Selected Solution

DataTalk-SLM

DataTalk-SLM is the selected solution because it directly connects user value, product feasibility, and schema-grounded AI. Instead of giving a generic AI chat answer, the system converts natural-language business questions into supported, verifiable data retrieval workflows.

Solution Architecture

Solution Explanation

01. Schema Grounding

The system knows supported tables, columns, relationships, and query families before answering.

02. Intent And Slot Compiler

User questions are converted into constrained intent and slot structures instead of open-ended generation.

03. SQL And Evidence Layer

Every answer can show compiled SQL, source rows, route, confidence, and latency for trust.

04. Visual Data Workspace

Users can inspect schema, preview data, ask questions, and validate results in one workflow.

05. SLM Training Path

The future model can learn company schema and question patterns while keeping production querying controlled.

06. Business Workflow Fit

The product helps PMs and business teams reduce analyst dependency and move faster from question to decision.

Final Product Direction

DataTalk is a schema-aware data copilot that helps business teams move from natural-language questions to verified SQL-backed answers. The product direction is to evolve the current demo into DataTalk-SLM: a company-specific small language model trained for fast, grounded, and trustworthy business querying.

GitHub Repo ↗ Explore Prototype ↗