Detecting Healthcare Fraud with ML

This is Part 2 of "The Compliance-First Fintech Playbook" series. Part 1 covered healthcare KYC/KYB requirements. Part 3 examines full compliance program costs.

At 2:47 AM our fraud detection system flagged a $3,200 lab equipment purchase from "MedSupply Express." Fourth such transaction that week from different dental practices, all using cards issued minutes before the purchase. By 3:15 AM we had blocked $47,000 in coordinated fraud across 12 practices.

Traditional banking fraud systems never would have caught this. The individual transactions were under typical velocity thresholds. The merchants appeared legitimate. The purchasing patterns looked like normal supply orders.

Healthcare fraud has unique signatures that standard fintech platforms miss entirely. After analyzing fraud patterns across 777 practices, we learned that effective healthcare fraud detection requires understanding both the clinical workflow and the financial ecosystem that supports it. SAR workflows and audit trails come first. Models slot into that process.

Consumer banking fraud detection focuses on velocity, geography, and merchant category anomalies. Healthcare fraud exploits clinical purchasing patterns and practice operational flows that general-purpose systems do not understand.

After-hours lab order fraud: Legitimate dental labs close at 5-6 PM. Orders placed at 10 PM using newly-issued virtual cards represent 89% fraud in our dataset. Standard fraud systems check velocity and spending limits. They do not understand healthcare business hours.

Supply vendor impersonation: Fraudsters create shell companies with names like "Dental Supply Solutions" or "Healthcare Equipment LLC." To consumers, these look legitimate. To dental practices, these names do not match known suppliers like Patterson, Schein, or Benco. Our ML model learned legitimate supplier patterns from 777 practices' transaction history.

Equipment financing exploitation: Fraudsters exploit the fact that dental equipment purchases range from $15,000 (digital X-ray) to $150,000 (cone beam CT scanner). They submit fraudulent applications with stolen practice information for mid-range equipment that does not trigger manual underwriting review.

Insurance reimbursement timing attacks: Legitimate practices show predictable cash flow patterns with low balances followed by insurance reimbursement deposits. Fraudsters exploit this by timing large purchases during historically low-balance periods, knowing practices expect deposits within 2-3 days.

These patterns are invisible to traditional fraud detection because they require understanding healthcare operations.

Healthcare fraud detection requires feature engineering that captures clinical and operational context beyond standard transactional features.

Temporal features include time-of-day for specific transaction types, day-of-week patterns for lab orders versus supply purchases, equipment purchase timing relative to practice business hours, and insurance deposit intervals and payment timing patterns.

Practice operational features include provider NPI registration date versus account opening date, practice location type (strip mall versus medical building versus hospital), staff size indicators from payroll transaction patterns, and patient volume estimates from card transaction frequency.

Vendor ecosystem features include supplier name matching against known healthcare vendors, shipping address consistency with practice location, equipment model numbers matching practice specialty, and lab order frequency matching practice patient volume.

Financial flow features include insurance reimbursement patterns and timing, payroll regularity and staff count implications, equipment financing payment consistency, and supply purchasing seasonality (back-to-school, holiday patterns).

The model architecture uses an ensemble approach.

XGBoost for tabular features:

# Healthcare-specific feature engineering
def engineer_healthcare_features(transaction_data, practice_data):
    features = {}

    # Business hours scoring
    features['business_hours_score'] = calculate_business_hours_score(
        transaction_data.timestamp,
        practice_data.specialty
    )

    # Supplier validation
    features['known_supplier_score'] = validate_supplier_name(
        transaction_data.merchant_name,
        practice_data.specialty,
        known_suppliers_db
    )

    # Cash flow timing
    features['cash_flow_timing_score'] = analyze_cash_flow_timing(
        transaction_data.amount,
        practice_data.recent_deposits,
        practice_data.typical_cycle
    )

    return features

# XGBoost model for fraud scoring
xgb_model = XGBClassifier(
    n_estimators=200,
    max_depth=6,
    learning_rate=0.1,
    subsample=0.8,
    colsample_bytree=0.8,
    objective='binary:logistic'
)

LSTM for sequence modeling:

# Transaction sequence analysis
def build_sequence_model():
    model = Sequential([
        LSTM(128, return_sequences=True, input_shape=(30, n_features)),
        Dropout(0.2),
        LSTM(64, return_sequences=False),
        Dropout(0.2),
        Dense(32, activation='relu'),
        Dense(1, activation='sigmoid')
    ])

    model.compile(
        optimizer='adam',
        loss='binary_crossentropy',
        metrics=['accuracy', 'precision', 'recall']
    )

    return model

# Analyze 30-day transaction windows
sequence_features = create_sequence_features(
    transactions_30d,
    practice_operational_data
)

Graph Neural Network for practice relationships:

# Practice and supplier relationship modeling
def build_practice_graph():
    # Nodes: practices, suppliers, equipment vendors
    # Edges: transaction relationships, geographic proximity

    practice_supplier_graph = nx.Graph()

    # Add practice nodes with features
    for practice in practices:
        practice_supplier_graph.add_node(
            practice.id,
            node_type='practice',
            specialty=practice.specialty,
            location=practice.location,
            risk_score=practice.risk_score
        )

    # Add supplier nodes
    for supplier in suppliers:
        practice_supplier_graph.add_node(
            supplier.id,
            node_type='supplier',
            legitimacy_score=supplier.legitimacy_score
        )

    # Add transaction edges with weights
    for transaction in transactions:
        practice_supplier_graph.add_edge(
            transaction.practice_id,
            transaction.supplier_id,
            weight=transaction.frequency,
            amount_avg=transaction.amount_avg
        )

    return practice_supplier_graph

The real-time scoring pipeline handles transaction authorization with sub-second decisions:

def real_time_fraud_scoring(transaction, practice_context):
    # Sub-second scoring requirement
    start_time = time.time()

    # Feature extraction (< 50ms)
    features = extract_features(transaction, practice_context)

    # ML model inference (< 100ms)
    xgb_score = xgb_model.predict_proba([features.tabular])[0][1]

    # Sequence model for historical context (< 100ms)
    sequence_score = lstm_model.predict([features.sequence])[0][0]

    # Graph context (< 50ms)
    graph_score = graph_model.score_transaction(
        transaction, practice_context
    )

    # Ensemble scoring
    final_score = (
        0.5 * xgb_score +
        0.3 * sequence_score +
        0.2 * graph_score
    )

    # Decision thresholds
    if final_score > 0.85:
        return "BLOCK", final_score
    elif final_score > 0.65:
        return "REVIEW", final_score
    else:
        return "APPROVE", final_score

    # Total processing time < 300ms

Standard device fingerprinting tracks browser attributes and network information. Healthcare fraud requires understanding device usage patterns specific to clinical workflows.

Most dental practices use integrated POS systems (Dentrix, Eaglesoft, Open Dental) that create predictable device fingerprints. Transactions from generic web browsers or mobile apps during business hours often indicate fraud. Legitimate practices typically use business-class internet with static IP addresses and consistent network equipment. Residential IP addresses during business hours or frequent IP changes flag potential account takeovers. Dental practices use specialized equipment (card readers, signature pads, receipt printers) that create unique USB device signatures. Transactions without these peripherals during patient visits indicate potential fraud.

// Healthcare-specific device fingerprinting
function generateHealthcareDeviceFingerprint() {
    const fingerprint = {
        // Standard signals
        userAgent: navigator.userAgent,
        screenResolution: `${screen.width}x${screen.height}`,
        timezone: Intl.DateTimeFormat().resolvedOptions().timeZone,

        // Healthcare-specific signals
        pmsIntegration: detectPMSIntegration(),
        peripheralDevices: detectUSBPeripherals(),
        networkClass: classifyNetworkInfrastructure(),
        businessHoursContext: calculateBusinessHoursContext()
    };

    return hashFingerprint(fingerprint);
}

function detectPMSIntegration() {
    // Check for common PMS system markers
    const pmsIndicators = [
        'DentrixConnector',
        'EaglesoftBridge',
        'OpenDentalAPI'
    ];

    return pmsIndicators.some(indicator =>
        window[indicator] || document.querySelector(`[data-pms="${indicator}"]`)
    );
}

function detectUSBPeripherals() {
    // USB device enumeration (requires permissions)
    if ('usb' in navigator) {
        return navigator.usb.getDevices().then(devices => {
            return devices.filter(device =>
                isHealthcarePeripheral(device.vendorId, device.productId)
            ).length;
        });
    }
    return 0;
}

Traditional velocity checks use simple thresholds: $X per day, Y transactions per hour. Healthcare practices have predictable but complex spending patterns that require contextual velocity modeling.

General dentistry daily supply spending runs $200-800. Lab orders come 2-8 per day at $150-400 each. Equipment purchases happen monthly at $2,000-15,000. Insurance deposits arrive 2-3x weekly at $5,000-25,000.

Oral surgery has higher supply costs due to surgical materials, equipment purchases include specialized surgical tools, different lab relationships (pathology versus prosthetics), and payment patterns include hospital facility fees.

Orthodontics involves bulk supply purchases for appliances, quarterly equipment maintenance, different lab relationships focused on appliance fabrication, and consistent patient payment plans.

Dynamic velocity modeling:

def calculate_dynamic_velocity_limits(practice, transaction_type):
    # Base limits by practice specialty
    base_limits = get_specialty_base_limits(practice.specialty)

    # Adjust for practice size
    size_multiplier = calculate_size_multiplier(
        practice.patient_volume,
        practice.staff_count
    )

    # Seasonal adjustments
    seasonal_multiplier = get_seasonal_multiplier(
        practice.specialty,
        datetime.now().month
    )

    # Recent pattern analysis
    pattern_multiplier = analyze_recent_patterns(
        practice.id,
        transaction_type,
        days_lookback=30
    )

    dynamic_limit = (
        base_limits[transaction_type] *
        size_multiplier *
        seasonal_multiplier *
        pattern_multiplier
    )

    return {
        'daily_amount': dynamic_limit * 1.0,
        'weekly_amount': dynamic_limit * 5.5,
        'monthly_amount': dynamic_limit * 22,
        'transaction_count_hourly': calculate_transaction_limits(dynamic_limit)
    }

# Example velocity check implementation
def check_velocity_limits(transaction, practice):
    limits = calculate_dynamic_velocity_limits(practice, transaction.type)

    current_usage = get_current_period_usage(practice.id, transaction.type)

    violations = []

    if current_usage.daily_amount + transaction.amount > limits['daily_amount']:
        violations.append('daily_amount_exceeded')

    if current_usage.hourly_count >= limits['transaction_count_hourly']:
        violations.append('hourly_count_exceeded')

    return violations

Suspicious Activity Report (SAR) filing becomes complex in healthcare contexts because legitimate clinical activities can appear suspicious to traditional banking criteria.

Healthcare practices naturally structure payments to avoid large cash deposits due to patient payment patterns. A practice receiving many $200-300 cash payments for dental work appears like structuring but represents normal patient co-pays. Practices near borders (Texas-Mexico, Washington-Canada) have legitimate cross-border patient flows. Canadian patients paying U.S. dental practices or Mexican nationals receiving dental tourism services create complex reporting scenarios. Traditional SAR filing flags transactions from high-risk countries. Healthcare practices treating immigrant populations or providing charitable care in underserved areas trigger these flags despite legitimate clinical reasons. Healthcare providers qualify for certain BSA exemptions, but fintech platforms must carefully document the basis for these exemptions and monitor for changes in practice operations.

SAR filing decision framework:

class HealthcareSARAnalysis:
    def __init__(self):
        self.clinical_exemptions = load_clinical_exemptions()
        self.geographic_contexts = load_geographic_contexts()
        self.professional_service_rules = load_professional_rules()

    def evaluate_suspicious_activity(self, activity_pattern, practice_context):
        # Initial suspicion scoring
        base_suspicion_score = self.calculate_base_suspicion(activity_pattern)

        # Healthcare context adjustments
        clinical_adjustment = self.apply_clinical_context(
            activity_pattern,
            practice_context
        )

        geographic_adjustment = self.apply_geographic_context(
            activity_pattern,
            practice_context.location
        )

        professional_adjustment = self.apply_professional_service_context(
            activity_pattern,
            practice_context.credentials
        )

        final_score = (
            base_suspicion_score +
            clinical_adjustment +
            geographic_adjustment +
            professional_adjustment
        )

        # SAR filing decision
        if final_score > SAR_FILING_THRESHOLD:
            return self.prepare_sar_filing(activity_pattern, practice_context)
        else:
            return self.document_no_action_decision(final_score, reasoning)

    def apply_clinical_context(self, pattern, context):
        # Adjust for legitimate clinical activities
        adjustments = 0

        # Cash payment patterns common in healthcare
        if pattern.type == 'cash_deposits' and pattern.amount_range == (200, 400):
            if context.specialty in ['general_dentistry', 'family_medicine']:
                adjustments -= 0.3  # Reduce suspicion

        # Equipment purchase patterns
        if pattern.type == 'large_purchases':
            if self.validate_equipment_purchase(pattern, context):
                adjustments -= 0.4  # Legitimate equipment purchase

        # Cross-border healthcare services
        if pattern.type == 'cross_border_activity':
            if context.location in BORDER_REGIONS:
                adjustments -= 0.2  # Dental tourism context

        return adjustments

Healthcare fintechs must establish protocols with FinCEN for healthcare-specific reporting nuances. For complex healthcare scenarios, consultation with FinCEN's Financial Institutions Hotline (866-556-3974) can clarify reporting requirements before filing. FinCEN maintains industry-specific guidance for healthcare BSA compliance. Healthcare fintechs should establish relationships with appropriate FinCEN analysts. Healthcare SAR filings require additional documentation about clinical context, patient care rationale, and professional service delivery that standard SARs do not address.

The most effective healthcare fraud prevention combines technical detection with practice education about emerging threats.

Practice-facing fraud alerts:

def generate_practice_fraud_alert(practice, threat_type, context):
    alerts = {
        'supply_vendor_impersonation': {
            'title': 'New Supplier Verification Required',
            'message': f'''
            We've detected a payment attempt to "{context.merchant_name}"
            which doesn't match your usual suppliers (Patterson, Schein, Benco).

            Please verify:
            - Is this a legitimate new supplier?
            - Did you initiate contact with them?
            - Do they have proper healthcare industry credentials?

            Contact us at [phone] if you need to authorize this payment.
            ''',
            'urgency': 'high'
        },

        'after_hours_activity': {
            'title': 'Unusual After-Hours Transaction',
            'message': f'''
            We've flagged a ${context.amount:,.2f} transaction at {context.time}
            which is outside your typical business hours.

            If this was authorized by your practice:
            - Reply "AUTHORIZE" to approve this payment
            - All future payments to this vendor will be approved

            If this was not authorized:
            - We've temporarily blocked the payment
            - Please call us immediately at [phone]
            ''',
            'urgency': 'immediate'
        }
    }

    return alerts.get(threat_type, generate_generic_alert(context))

Monthly briefings to practices about emerging fraud patterns help prevent account compromises. Q1 2025 brief covered shell supply companies targeting dental practices, equipment financing application fraud using stolen NPI numbers, business email compromise targeting practice administrators, and patient payment fraud using stolen insurance information.

Healthcare fraud detection requires sub-second decision-making with high accuracy and explainability for regulatory reporting.

Latency targets: feature extraction under 50ms, ML model inference under 100ms, rule engine evaluation under 25ms, decision logging under 25ms. Total authorization decision under 200ms.

Accuracy targets: false positive rate under 2% (minimize practice disruption), false negative rate under 0.5% (minimize fraud losses), explainability at 100% (required for SAR filings).

Scalability requirements: 1,000+ transactions per second per practice, real-time model updates without downtime, geographic distribution for sub-50ms latency, audit trail retention for 5+ years.

Production infrastructure:

# Infrastructure architecture
fraud_detection_service:
  load_balancer:
    type: "Application Load Balancer"
    health_check_path: "/health"

  application:
    container_count: 12
    cpu_limit: "2000m"
    memory_limit: "4Gi"
    autoscaling:
      min_replicas: 6
      max_replicas: 50
      target_cpu: 70%

  ml_models:
    xgboost:
      inference_endpoint: "/models/xgb/predict"
      model_version: "v2.1.3"
      latency_sla: "100ms"

    lstm:
      inference_endpoint: "/models/lstm/predict"
      model_version: "v1.8.2"
      latency_sla: "150ms"

    graph_neural_net:
      inference_endpoint: "/models/gnn/predict"
      model_version: "v1.2.1"
      latency_sla: "75ms"

  data_stores:
    feature_store:
      type: "Redis Cluster"
      node_count: 6
      memory_per_node: "16GB"

    transaction_history:
      type: "PostgreSQL"
      instance_type: "db.r5.2xlarge"
      read_replicas: 3

    model_artifacts:
      type: "S3"
      versioning: true
      lifecycle_policy: "retain_5_years"

Effective fraud detection creates feedback loops that improve both technical accuracy and practice operational security. The model improvement cycle runs through detection (real-time fraud flagging with confidence scores), investigation (manual review with healthcare context analysis), labeling (ground truth establishment for confirmed fraud and legitimate activity), retraining (monthly model updates with new labeled data), and deployment (A/B testing for model performance validation).

Practice security improvement runs through pattern sharing (aggregate fraud insights across practice network), education (targeted security training based on practice vulnerabilities), process improvement (operational recommendations to reduce fraud exposure), and vendor verification (maintained whitelist of legitimate healthcare suppliers).

Healthcare fintechs that understand fraud at this technical level can offer superior protection while maintaining operational efficiency for legitimate clinical activities.

Next in series

Part 3 - examines the full cost structure of compliance programs and why most healthcare fintechs underestimate these requirements by 3-5x.

Previous in series

Part 1 - healthcare KYC/KYB realities and why they cost 3-4x consumer KYC.

Data sources: Internal fraud detection analysis across 777 practices, FinCEN SAR filing guidance, healthcare fraud pattern analysis from Q3-Q4 2024 operational data

Detecting Healthcare Fraud with ML

Related