Insurance on AWS: Building AI-Driven Underwriting

Underwriting — the assessment and pricing of insurance risks — has traditionally been a labor-intensive, manual process. AI is fundamentally changing this: machine learning models can learn from millions of historical loss records, recognize patterns invisible to humans, and deliver risk assessments in seconds rather than days. This article shows how insurance companies can build an AI-driven underwriting process on AWS — including the regulatory requirements from BaFin and the EU AI Act for algorithmic decision-making.

Why AI Underwriting Now

The European insurance market faces significant cost pressure. Loss ratios are rising due to climate events (extreme weather, flooding), inflation (higher repair costs), and demographic shifts (age-related healthcare costs). At the same time, InsurTechs like Wefox, Element, and digital-native competitors are increasing competitive pressure through radical digitalization.

Gartner forecasts that by 2027, over 50% of all non-life underwriting decisions will be supported or made by AI models (Gartner Insurance Technology Trends). For insurers, this means: those who don't invest in AI underwriting today will lose competitiveness within a few years.

Quantified benefits of AI underwriting:

30–50% reduction in processing time per application
10–20% improvement in loss ratio through more accurate risk assessment
Scalability: AI models can handle thousands of applications simultaneously
Consistency: no day-to-day variations as with human underwriters

Step 1: Building a Data Lake for Underwriting Data

No AI underwriting without data. The first step is building a structured data lake that consolidates all relevant data sources.

Typical data sources for an insurance data lake:

Data Source	Insurance Line	AWS Ingestion
Historical loss data (structured)	All lines	AWS Glue, AWS DMS
Motor telematics (IoT)	Motor insurance	AWS IoT Core, Amazon Kinesis
Weather data / climate models	Property, motor, agriculture	Amazon S3 (external APIs)
Building / cadastral data	Residential, commercial property	Amazon S3, AWS Glue
Medical data	Health, life insurance	HL7 FHIR on AWS, Amazon HealthLake
External scoring services	All lines	API Gateway, Lambda, S3

Amazon S3 forms the storage foundation, AWS Lake Formation manages data access with role-based controls, and AWS Glue orchestrates ETL processes from heterogeneous source systems. Amazon Macie automatically detects and reports on sensitive personal data in the data lake — critical for GDPR compliance with health and behavioral data.

Step 2: ML Pipeline with Amazon SageMaker

The ML pipeline for AI underwriting follows a standardized process that Amazon SageMaker fully orchestrates:

Feature Engineering: SageMaker Feature Store centrally manages all calculated features. Important features for motor underwriting: driver profile (age, driving experience), vehicle characteristics (make, power, age), claims history, zip code–based risk classification, telematics driving style score. The Feature Store ensures identical feature calculations in training and inference (avoiding training-serving skew).
Model Training: SageMaker Training Jobs train models on historical data. Gradient Boosting algorithms (XGBoost, LightGBM) are well-suited for underwriting models due to their high performance on tabular data and natural interpretability.
Model Evaluation: For underwriting models, not just predictive accuracy matters — fairness and calibration are also critical. A calibrated model predicts correct probabilities (e.g., "20% loss probability" actually corresponds to 20% loss rate in that group). SageMaker Clarify conducts bias analyses and explainability reports.
Deployment: SageMaker Real-Time Endpoints expose the model as an API. An underwriting request sends application data to the endpoint and receives a risk estimate and suggested premium within milliseconds.
Monitoring: SageMaker Model Monitor continuously tracks data quality (data drift) and prediction quality (concept drift). If the model deviates significantly, automatic retraining is initiated.

Regulatory Requirements: BaFin, GDPR Art. 22, and EU AI Act

AI underwriting does not exist in a regulatory vacuum. Three regulatory layers are relevant:

GDPR Art. 22 — Right to Explanation: If an insurance decision (rejection, premium increase) is made exclusively by an automated system, the data subject has the right to a meaningful explanation and the right to human review. Amazon SageMaker Clarify delivers SHAP-based (SHapley Additive exPlanations) feature importances as the technical foundation for customer explanations. Important: the explanation must be understandable to laypersons — technical SHAP values alone are insufficient.
BaFin Requirements for Algorithmic Systems: BaFin expects algorithmic systems in insurance to be documented, auditable, and non-discriminatory. This includes: complete model documentation (training data, algorithm, hyperparameters), regular checks for impermissible discriminatory features (gender, nationality, religion are explicitly prohibited), backtesting of model quality, and a governance structure for model approval and deployment.
EU AI Act — High-Risk AI System: The EU AI Act (in force since August 2024) classifies AI systems used for insurance risk assessment and premium calculation as high-risk AI systems (Annex III). This means: technical documentation, conformity assessment, registration in the EU database for high-risk AI systems, an operational manual for human oversight, and ongoing post-market monitoring.

AWS supports compliance with these requirements through: SageMaker Clarify (bias, explainability), SageMaker Model Cards (model documentation), SageMaker Model Monitor (ongoing monitoring), and AWS Audit Manager (audit evidence collection for regulatory proof).

Real-Time Underwriting vs. Batch Scoring

Not every underwriting scenario requires real-time scoring. The choice depends on the use case:

Scenario	Recommended Approach	AWS Service
Online premium calculator (direct sales)	Real-time inference	SageMaker Real-Time Endpoint
Portfolio review (annual repricing)	Batch scoring	SageMaker Batch Transform
Loss prevention (IoT telematics)	Real-time stream	Amazon Kinesis + Lambda + SageMaker
Broker underwriting (complex risks)	Asynchronous (< 1 minute)	SageMaker Asynchronous Inference
Reinsurance (portfolio analysis)	Batch	SageMaker Batch + EMR

Frequently Asked Questions

What are the BaFin requirements for algorithmic underwriting decisions?: BaFin expects documented, auditable, and non-discriminatory algorithmic systems. Customers have a right to explanation under GDPR Art. 22 for automated decisions. Amazon SageMaker Clarify provides SHAP-based explainability reports as the technical foundation.
Which data sources are typically used for AI underwriting?: Historical loss data, external scoring services, weather data, geodata, IoT telematics (motor), building data, and medical data (health/life insurance). AWS provides suitable ingestion and storage services for every data type.
How long does it take to bring an AI underwriting model to production?: With SageMaker and existing historical data: 8–12 weeks for development and testing. Regulatory approval and production integration: a further 3–6 months. Storm Reply accompanies the entire process including BaFin-compliant documentation.

Sources

AI underwriting for your insurance company?

Storm Reply develops ML pipelines on AWS — from data strategy to a regulatory-compliant production solution.

Get in touch