AI & Technology

Heterograph Neural Networks Plus LLMs to Expose Collusive Billing

By Jimmy Joseph

Most billing abuse does not look dramatic whenย viewedย one claim at a time. A single encounter may appearย ordinary,ย a single procedure may be defensible, and a single provider may not stand out in a traditional scorecard. The real signal often lives in relationships. The same patients may circulate through the same providers. The same diagnosis and procedure combinations may repeat across an unusually tight cluster. Referral paths may start to look less like normal care coordination and more like an organized pattern. In that sense, collusive billing is not just a data problem. It is a network problem.ย 

That is exactly whyย heterographย neural networks deserve more attention in healthcare payment integrity.ย Claimsย ecosystems are naturally heterogeneous. They involve different entity types, different relationships, different timelines, andย different levelsย of meaning. Patients, providers, claims, diagnoses, procedures, facilities, referrals, and time windows all interact in ways that are hard toย representย in a flat row-and-column table. Aย heterographย gives those relationships structure. A graph neural network can then learn from that structure. But detection is only half the job. Investigators, auditors, and payment integrity teams do not act on embeddings or anomaly scores alone. They need a grounded explanation they can read, verify, and use. That is where LLMs become powerful โ€“ not as the detector itself, but as the explanation and investigation layer on top of graph evidence.ย The combination is compelling. The graph model can surface suspicious relational patterns that a tabular system might miss. The LLM can translate those patterns into a narrative that makes sense to humans. And once that workflow becomes interactive, modern serving techniques such as paged KV caches and cache reuse become essential for making the system practical at scale.ย 

Why claims data should be modeled as aย heterographย 

Claims data is often stored as a transaction log, which makes sense for processing but not always for understanding behavior. A claim belongs to a patient. It is billed by one provider and may involve another. It includes diagnoses, procedures,ย possibly modifiers, place-of-service information, and timing. It may be connected to other claims through shared entities, repeated episodes, referral chains, prior authorizations, orย utilizationย patterns. Treating all of that as a single feature vector can work for some tasks, but itย compresses awayย the structure that often matters most in collusive behavior.ย 

Aย heterographย preserves that structure. Different node types canย representย patients, providers, claims, ICD codes, CPT or HCPCS procedures, facilities, and time buckets. Different edge types canย representย treatment relationships, billing relationships, referral relationships, co-occurrence, temporal succession, or shared organizational affiliation. Once those connections exist, the system can learn patterns that are difficult to express manually. It can notice when the same small group of providers repeatedly shares unusually similar billing profiles. It canย identifyย dense patient-provider-procedure motifs. It can detect the kind of repeated cross-entity behavior that looks weak in isolation but suspicious in aggregate.ย This matters because collusion is rarely a simple outlier problem. It is often a repeated coordination problem. That makes graph-native modeling much more aligned with the underlying reality of the data.ย ย 

What graph neural networks see that tabular models often missย 

Tabular models are still useful in payment integrity. They are fast, interpretable in familiar ways, and often effective when the goal is to flag claims with unusual values or combinations of features. But collusive billing often depends on context that does not sitย neatly insideย one row. Whoย referredย whom matters. Which patients recur across a providerย clusterย matters. How procedure codes propagate throughย a networkย matters.ย Whether a billing pattern is isolated or shared across aย ringย matters.ย Graph neural networks are built for exactly that kind of problem. They do not just learn from an entityโ€™s own features. They aggregate information from neighbors and from the structure of the surrounding graph. A model such asย GraphSAGEย can support evolving graphs where new providers, patients, or claims appear over time. Attention-based graph models can learn that some neighbors should influenceย a predictionย more than others. Heterogeneous graph architectures go further by respecting the fact that not all nodes and edges mean the same thing. A patient-to-providerย edge isย not the same asย a claim-to-diagnosis edge, and a diagnosis-to-procedure pattern isย not the same asย a referral path.ย 

That relational inductive bias is the key advantage. Instead of flattening the network into handcrafted statistics and hoping the important structure survives, the model learns directly from the relational system itself. In practice, that canย surface patterns such as suspicious provider rings, recurring claim bundles, improbable referral clusters, and reuse of the same behavioral template across multiple actors. These are exactly the kinds of signals that can be missed when the model only sees one claim at a time.ย 

Detection alone is not enoughย 

Even an excellent graph model will fail to create operational value if it cannot explain why a case matters. In healthcare payment integrity, a scored lead is not the final product. The final product is an investigation that a human can understand and act on. That means the system needs to answer questions such as: Which entities are driving the alert? What subgraph is most relevant? Which claims or code combinations are repeated? How does the provider compare to peers? What changed over time? What should the investigator look at next? This is where the system should shift from graph scoring to evidence extraction. Rather than sending the LLM a vague risk score and asking it to improvise, the better approach is to first isolate the suspicious subgraph, gather the supporting artifacts, and then use the LLM to write from that grounded evidence. In other words, the graph model identifies the โ€œwhere,โ€ the evidence layer identifies the โ€œwhy,โ€ and the LLM explains the โ€œwhat it means.โ€ย 

That distinction matters. It keeps the LLM in a role where it adds clarity rather than uncontrolled judgment. The model is not inventing suspicion. It is narrating structured evidence. It can explain that aย provider cluster sharesย repeated patient overlap, unusually similar procedure sequences, concentrated referral paths, and code usage that diverges from peer norms over a defined time window. It can summarize those findings in plain language, reference the exact entities involved, and suggest next-best actions such as chart review, peer comparison, provider outreach, orย ruleย refinement.ย This is the momentย whereย AI becomes usable for investigators. A graph score may impress a data scientist, but a grounded narrative helps an auditor decide what to do.ย 

From suspicious subgraphs to investigator-ready narrativesย 

A strong graph-to-text pipeline begins with disciplinedย evidenceย handling. First, the system ranks suspicious regions of the graph. Then it extracts a compact explanation subgraphย containingย the nodes, edges, and features that contributed most to the ranking. After that, it retrieves the supporting business context: claim details, procedure descriptions, diagnosis context, policy logic, peer benchmarks, referral summaries,ย utilizationย baselines, and prior investigation notes whereย appropriate.ย 

Only then should the LLM step in.ย At that point, the LLM can do something highly valuable. It can convert a dense technical signal into a readable explanation. It canย sayย that the flagged pattern is not based on one claim but on repeated interaction across a small cluster of patients and providers. It can explain that the same procedure mix recurs with unusual frequencyย relativeย to peers. It can note that the activity is concentrated in a narrow referral loop and that the timing pattern is more consistent with coordinated billing than with typical case variation. It can also adapt the same evidence into multiple output styles: a short alert summary, an auditor memo, a case triageย note, or an executive-level explanation.ย This is where LLMs shine in payment integrity. They reduce the distance between machine-detectedย structureย and humanย action. They do not replace investigators. They reduce the friction investigators face when trying to interpret a complex case.ย 

Why serving architecture suddenly mattersย 

Once teams start imagining this workflow, they often focus on the graph model and the LLM prompt but underestimate the serving challenge. Real investigations are notย one-shotย completions. They are interactive. A user asks forย theย suspicious ring. Then they ask for the strongest evidence path. Then they request a peer comparison. Then they add one more provider, narrow the time window, or ask for a cleaner summary for a formal audit note. This creates a sequence of short, related LLM turns that repeatedly reuse much of the same prompt context.ย 

That is where KV cache behavior becomes a first-class design issue.ย Without efficient cache management, repeated investigative prompts become unnecessarily expensive. The system keeps recomputing shared prefixes, duplicating memory, and fragmenting cache space under mixed workloads. That is a problem for latency, throughput, and cost. It is especially painful in environments where healthcare organizations want predictable performance, tighter infrastructure control, and often some level of on-prem or private-cloud deployment.ย 

Paged KV caches change the picture. Instead of forcing the system to manage long contiguous memory regions for every request, paged allocation breaks KV memory into smaller blocks that can be reused more efficiently. This reduces waste, helps avoid fragmentation, and supports larger effective batch sizes. When paired with prefix caching or prompt caching, the benefits grow further. Repeated investigation templates, repeated system prompts, and repeated evidence scaffolds do not need to be rebuilt from scratch each time. The system can reuse the shared context and spend its compute budget on the new parts of the query. That has real product implications. Suddenly the investigation assistant feels responsive enough to be part of the analyst workflow rather than a slow back-office experiment. Time to first token improves. Follow-up turns feel lighter. Multi-hop graph exploration becomes less expensive. The cost curve changes from โ€œinteresting demoโ€ to โ€œsomething we can actually operationalize.โ€ย 

Hybrid systems will winย 

The best collusive billing systems will not be purely rule-based, purely graph-based, or purely LLM-based. They will be hybrid systems. Rules are still valuable for known abuse patterns and policy constraints. Graph models are excellent at surfacing relational anomalies and coordinated behavior. Retrieval keeps explanations grounded in evidence and operational context. LLMs make the results understandable and actionable.ย This layered design is important for trust. Payment integrity teams need accuracy, but they also need repeatability and defensibility. They need to know why a lead was generated, what evidence supports it, and how the explanation was formed. A hybrid stack supports thatย betterย than a monolithic black box. It also makes the system easier to evolve. New rules can be added. Graph schemas can expand. Retrieval sources can improve. Prompts can be tuned for different audiences. Serving can beย optimizedย independently.ย 

In short, the architecture becomes modular without becoming fragmented.ย 

What success should look likeย 

Evaluation should reflect real investigative value, not just abstract model quality. Precision at the top of the queue matters more than broad statistical elegance because investigators can only review so many leads. Time saved during triage matters because explanation is part of the outcome. False-positive harm matters because even a polished AI narrative is costly if it sends analysts down weak paths. Explanation quality matters because users must be able to verify the story against the underlying evidence. A mature system should be measured on both technical and operational terms. Did it surface leads that traditional methods missed? Did it reduce the time required to understand a suspicious pattern? Did it improve consistency across investigators? Did it support defensible audit narratives? Did it lower review burden without increasing noise? Those are the questions that determine whether the technology is solving a real payment integrity problem.ย 

The bigger shiftย 

The deeper lesson is that collusive billing detection is moving from isolated anomaly scoring toward relational intelligence. Healthcare fraud, waste, and abuse oftenย emergeย from systems of interaction, not from single bad records.ย Heterographย neural networks are a natural way toย representย and learn from those systems. LLMs are a natural way to communicate what the graph has found. And modern cache-aware serving is what makes the whole workflow responsive enough for daily use.ย That combination is powerful because it respects how these problems actually work.ย The graph handlesย structure. The evidence layer handles grounding. The LLM handlesย explanation. The serving layer handles performance. Put together, they createย something much more valuable than another fraud score. They create an investigation companion that can surface suspicious networks, explain them clearly, and help human experts move faster with more confidence.ย That is where this field is heading. Not toward AI that replaces investigators, but toward AI that helps them see the invisible structure behind collusive billing โ€“ and turns that structure into action.ย 

Author Bioย 

Jimmy Joseph is an AI/ML engineer and researcher specializing in healthcare payment integrity, large-scale claims analytics, and applied machine learning systems. His work focuses on building practical AI solutions thatย operateย under real-world enterprise constraints, including performance, governance, and auditability. He writes about deep learning, LLM infrastructure, healthcare AI, and scalable inference design, with a particular interest in turning advanced AI concepts into production-ready systems that deliver measurable operational value.ย 

ย 

Author

Related Articles

Back to top button