CA

What AI Still Gets Wrong in Indian Tax: A CA's Caveats

P

CA Prateek Agarwal ·

AI has become a genuinely useful first draft for a lot of Indian tax work — research, notice replies, computation logic, client explainers. But it fails in ways that are predictable once you have seen them a few times, and the failures are dangerous precisely because the output reads so confidently. This is an honest catalogue of where AI still gets Indian tax wrong, and the specific guardrail for each — written for CAs who are already using it and want to stop getting burned.

Why this matters more in tax than almost anywhere else

Tax is a "your money or your life" domain. A wrong answer does not just embarrass you — it produces a short-paid return, a disallowed deduction, an interest-and-penalty exposure, and your name on the filing. Unlike a marketing draft you can eyeball, a tax answer can be wrong in a way that only a specialist notices, and the client trusted you precisely because they cannot. So the question is never "is AI smart enough?" It is "where does it fail, and can I catch the failure before it reaches a return?"

The honest answer is that AI fails in six recurring ways on Indian tax. Each has a clean guardrail. None of them mean you should stop using it — they mean you should use it like a competent but unsupervised articled assistant: useful for the legwork, never trusted for the final position.

1. Hallucinated section numbers and case citations

This is the failure that gets CAs into real trouble, because it is the one that looks most authoritative. Ask a general-purpose model a tax question and it will frequently cite a section number, a sub-clause, a rule, or a tribunal decision that sounds exactly right — correct format, plausible-sounding parties, a confident "as held in" — and is simply invented. The model is completing a pattern, not retrieving a fact. Citations are the highest-risk part of any AI tax output because they are the part you are most tempted to copy verbatim into a reply or an opinion.

The guardrail: never let a citation reach a client document until you have opened the bare provision or the actual order. Treat every section number and every case name from a generic LLM as unverified by default. If you cannot independently confirm it on the income-tax site, a paid database, or the original judgment, it does not go in. This single discipline removes the most damaging category of AI tax error.

2. Stale provisions — the law moved, the model didn't

Indian tax law changes every single year. The Finance Act rewrites rates, thresholds, surcharge slabs, and the default-versus-optional posture of the new regime under Section 115BAC; CBDT issues circulars and notifications mid-year; GST rates and conditions shift through Council meetings. A model trained before a given Budget will answer with last year's law and tell you so with the same confidence it uses for settled provisions. The classic trap is the old regime versus new regime default and the deduction set available under each — this has moved repeatedly, and AI routinely answers with a stale version.

The guardrail: treat anything rate-, slab-, threshold-, or regime-dependent as time-sensitive by definition, and always re-anchor it to the assessment year you are actually filing for. Ask the model which AY its answer assumes, then verify that year's numbers against the current Finance Act and CBDT material yourself. For anything touched by the latest Budget, assume the model is behind until proven otherwise. India-law-trained tools that refresh their corpus mitigate this, but even they need a date check.

3. Generic, US-centric answers that miss the Indian specifics

General-purpose models have read far more US tax content than Indian, so on a thinly specified prompt they drift toward American defaults — "write-offs," "1099s," capital-gains logic that does not match our holding periods and indexation rules. Even when they stay in India, they miss the local texture that is the whole job: TDS section-by-section nuances and rate differences, place-of-supply rules that decide whether a transaction is inter-state or intra-state, the interaction between TDS and TCS, presumptive taxation conditions, and state-level variations. The answer is not wrong in a dramatic way — it is generic in a way that quietly omits the thing that actually decides the case.

The guardrail: specify Indian context aggressively in the prompt — the section, the AY, the nature of the assessee, the state where relevant — and read the answer for what it left out, not just what it got wrong. The most useful mental check: "would a model that had only read US tax have produced this same paragraph?" If yes, it has not engaged with the Indian specifics, and you need to push it or do that part yourself. This is also where a domain tool earns its keep — see the case for India-law-trained tools below.

4. Confident-but-wrong on interpretation and judgement

A lot of Indian tax is not lookup — it is judgement. Is this expenditure capital or revenue? Is this a contract for service attracting one TDS section or another? Is this arrangement a genuine business restructuring or something an assessing officer will recharacterise? AI is weakest exactly here, because interpretive questions have no single retrievable answer; they turn on facts, intent, and a body of conflicting decisions. The model will still give you a clean, confident verdict — and the confidence is inversely related to how settled the question actually is.

The guardrail: use AI to lay out the competing positions, not to pick one. A good prompt is "set out the arguments for treating this as capital and the arguments for revenue, with the factors that decide it" — not "is this capital or revenue?" The moment a question requires weighing facts against a grey area of law, the AI output is an input to your judgement, never a substitute for it. The professional position, and the liability for it, stays with you.

5. It cannot see facts it was not given

This is the most underrated failure mode. An LLM only knows what is in the prompt and its training data. It cannot see the client's books, the actual TDS deducted, the bank statements, the prior years' returns, or the specific clauses in the agreement you are looking at. So when you ask a question that depends on the client's actual numbers, the model will either ask for them or — worse — quietly assume them and answer as if it knew. Computations built on assumed facts look complete and are useless.

The guardrail: give the model the facts explicitly, and be suspicious of any answer that did not first establish what it needed. For anything turning on real figures — a TDS reconciliation, an advance-tax computation, a capital-gains working — the data has to come from your records, run through your tools, not from the model's imagination. Tools that connect to actual client data (ledgers, 26AS/AIS, GST returns) are categorically different from a chat model answering in a vacuum. For the data-grounded version of this work, see AI for the ITR filing workflow and automating TDS reconciliation with AI.

6. Confidentiality blind spots

When you paste a client's PAN, financials, draft assessment order, or the text of a notice into a consumer AI chat, you may be sending privileged client data to a third party whose retention and training practices you do not control. This is not a hypothetical compliance concern — it sits squarely against your obligations to the client and, increasingly, under the Digital Personal Data Protection Act. The model did not get the tax wrong here; you created an exposure the client never agreed to.

The guardrail: decide deliberately what may and may not be pasted into which tool. Use enterprise or India-hosted tools with clear no-training and data-handling terms for anything client-identifiable, and strip identifiers when using consumer tools for generic research. We covered this in depth in the DPDP Act and AI tools handling client data — it is the one caveat on this list that is about your conduct, not the model's accuracy, and the one regulators will ask about first.

Why India-law-trained tools beat generic LLMs

Most of the failures above — invented citations, stale provisions, US drift, confidentiality — are far less severe with tools built specifically for Indian tax than with a general chat model. A domain tool is grounded in an Indian legal corpus, retrieves rather than guesses, and tends to cite a real source you can click through to. That retrieval-and-citation design is the structural fix for hallucination.

In practice this is the category to reach for on client work:

  • TaxBotGPT — an AI tax assistant trained on Indian tax law, giving cited answers and helping draft notices.
  • Taxmann.ai — AI legal and tax research and drafting backed by Taxmann's content.
  • VIDUR — an AI assistant for Indian tax, corporate, and regulatory law research and drafting.
  • Vaive — an AI co-pilot for GST litigation, research, and drafting.

But "better" is not "verified." A grounded tool can still surface a decision that does not support the proposition you are citing it for, can still be a Budget behind on a rate, and still cannot see your client's books. The citation that matters is the one you opened yourself. India-law-trained tools change the base rate of error dramatically; they do not transfer the responsibility off your shoulders.

A practical verification routine

The way to use AI in tax without getting burned is to make verification a fixed step, not an afterthought:

  1. Specify the AY and the Indian context up front so you are not fighting stale law and US defaults from the start.
  2. Open every citation. Section, rule, circular, case — confirm it against the bare provision or the original order before it enters any client document.
  3. Separate lookup from judgement. Let AI retrieve and summarise; you decide the position on anything interpretive.
  4. Feed it real facts, or do not trust the figures. Computations come from your records and your tools, never from assumed numbers.
  5. Decide what may be pasted where before you paste anything client-identifiable.

For the notice-drafting workflow specifically, prompting AI for GST notices walks through how to get a usable first draft while keeping the verification discipline intact.

Frequently asked questions

Can I rely on AI for Indian tax research?

As a first draft, yes — to surface the relevant provisions, frame the issue, and produce a structured summary far faster than starting cold. As a final authority, no. The two failure modes that catch people are hallucinated citations and stale law, so anything you carry into a client document must be verified against the bare provision and confirmed for the correct assessment year. India-law-trained tools that cite their sources are far safer than a general chat model, but the verification step does not go away.

Why does AI invent section numbers and case citations?

Because a language model completes plausible patterns rather than retrieving facts. A citation has a very regular format, so the model can generate one that looks perfectly correct — right section style, plausible party names — without any real source behind it. This is why citation is the single highest-risk part of AI tax output, and why opening the actual provision or order before using it is non-negotiable.

Are India-specific AI tax tools more accurate than ChatGPT for Indian tax?

Generally yes, for client work. Tools trained on an Indian legal corpus retrieve from real sources, cite them, and drift far less toward US tax defaults, which directly reduces the hallucination and generic-answer problems. They are also a better answer on confidentiality if they offer no-training, India-hosted terms. But they can still be behind on the latest Budget and still cannot see your client's books — so they raise the floor on accuracy without removing your obligation to verify.

Will these failures mean AI eventually replaces tax professionals?

No. The failures that are easy to fix — citations, stale rates — are improving fast, but the hard ones are structural: AI cannot see facts it was not given, and it cannot own the judgement and the liability on an interpretive position. Those are the core of the work. See Will AI Replace Chartered Accountants in India? for the longer argument.

The takeaway

AI is now a legitimate part of the Indian tax workflow, and pretending otherwise just means doing the legwork by hand. But its failures are specific and repeatable: it invents citations, lags the latest Budget, defaults to generic or US answers, sounds most confident exactly where the law is greyest, cannot see facts it was not handed, and creates confidentiality exposure if you are careless. Each has a guardrail, and the common thread is verification — open the citation, fix the assessment year, feed it real facts, and keep the judgement and the sign-off with the professional. Use India-law-trained tools for client work because they fail less, then verify anyway. Browse the software directory to see the tools built for the Indian regime.

Related software