
Fintech
7
Mins
Why Securing Funding is Critical for Software Startups
Know about strategies software startup adapt
The Development of an AI-Powered Financial Asset Extractor: 74% to 99% Accuracy.

In the era of the Internet, we have money-lotto email inboxes. Bank statements, insurance policies, brokerage confirmations, real estate documents, all of them are delivered by email. But here's the challenge: How do you systematically track and organize all your financial assets when they're scattered across thousands of emails?
The process of manual tracking financial assets is tiresome and prone to errors. The best accuracy of traditional regex-based parsing is 74%. What if we could use AI to automatically extract, classify, and organize this information with 99% accuracy?
That is what we have constructed and here in this article, will take you through construction, challenges with it, and solutions for email classification for Financial Asset.

At Eternalight, we built a Java Spring Boot application that:
Connects to Gmail API (with user consent) to read emails
Stores emails in PostgreSQL for data persistence
Uses OpenAI's GPT-4 to intelligently extract structured asset data
Classifies assets into 9 predefined categories
Processes data 30x faster using batch processing and multithreading
Bank Accounts
Brokerage Accounts
Insurance Policies
Retirement Accounts
Real Estate
Professional Contacts
Crypto Accounts
Safe Deposit Box
Vehicle Information
Breaking down our approach in three unique sections how we have navigated this critical scenario with email classification for financial assets.
This step will be the first step at trying to extract some form of data in emails with the application of basic regular expressions only. Although the method was quick and inexpensive, it had difficulty with the variability of emails in real life, language ambiguity and varying patterns of account numbers- indicating the weakness of rule-based systems.
Results: 74% accuracy, ₹0 cost, Fast processing
Why it failed:
Emails have inconsistent formats
Natural language is hard to parse with patterns
Missing context leads to misclassification
Account numbers in various formats (XXXX-1234, ****1234, etc.)
Example regex limitations:
// This fails for variations
Pattern accountPattern = Pattern.compile("Account.*?(\\d{4,})");
During this stage, we added OpenAI API which will process emails one at a time. The model provided almost perfect accuracy in comprehension of the context, intent, and subtle language. Nonetheless, processing was done sequentially resulting in high latency, which resulted in low user experience and poor scalability.
Results: 99% accuracy, ~₹90 cost (90 emails), 90 seconds
The breakthrough: AI understands context and nuance
The problem: Terrible user experience
1 second per email = 90 seconds for 90 emails
Users wait too long for results
Not scalable for large email volumes
This version is a mix of system-level optimization and AI accuracy. Using batch processing and multithreading to parallelize email processing reduced end to end latency to minutes but cost and accuracy remained the same.
Results: 99% accuracy, ~₹90 cost, 2-3 seconds (90 emails)
The solution: Process multiple emails in parallel batches
Here's the core method that sends emails to OpenAI:
Java code :
The system brain is the trigger. The following is what makes it effective:
Fetch the asset-related information of the subsequent user in the form of the JSON.
An array of data in the form of JSON can be created using.
following keys (all values should be strings):
ASSET TYPES (must match exactly):
- Bank Accounts
- Brokerage Accounts
- Insurance Policy
- Retirement Accounts
- Real Estate
- Professional Contacts
- Crypto Accounts
- Safe Deposit Box
- Vehicle Information
EXTRACTION RULES:
1. Focus on the email body field primarily
2. Use subject/to fields for supplementary context
3. Create separate entries for each account number
4. Replace asterisks (*) or dots (.) with 'X' in account numbers
5. Remove words like 'Checking' or 'Savings' from account numbers
6. Classify mortgages/loans/insurance as "Real Estate"
7. Omit keys with missing values
8. Return ONLY valid JSON array (no markdown, no preamble)
OUTPUT FORMAT:
Key prompt engineering principles:
Clear, specific instructions
Exact format requirements
Edge case handling
No room for interpretation
JSON-only output (easy to parse)
The game-changer for performance:
Performance improvement:
Before: 90 emails × 1 second = 90 seconds
After: 90 emails ÷ 10 per batch × 0.3 seconds = ~3 seconds
Metric | Regex | AI (Sequential) | AI (Batch) |
Accuracy | 74% | 99% | 99% |
Cost (90 emails) | ₹0 | ₹90 | ₹90 |
Time (90 emails) | < 1 second | ~90 seconds | 2–3 seconds |
Scalability | Limited | Poor | High |
Maintenance Effort | High | Low | Low |
Before:
User: "Analyze my emails"
System: *90 seconds later* "Here are your assets"
User: (probably left the page)
After:
User: "Analyze my emails"
System: *3 seconds later* "Here are your assets"
User: "That was fast!"
Email snippet:
Your Chase checking account (****1234) has been credited with $5,000.
Account: Chase Premier Checking
Regex output: Misses "Chase Premier Checking" context
AI output:
Here are our best strategies:
Cost savings: ~₹84 per 90 emails (from ₹90 to ₹6)
Cost savings: ~60% reduction while maintaining accuracy

The same architecture can power:
Extract: skills, experience, education, salary expectations Match against job requirements with scoring
Extract: parties, dates, obligations, payment terms Flag risky clauses automatically
Extract: vendor, amount, due date, line items auto-categorize expenses for accounting
Categories: Bug/Feature/Question/Complaint Route to appropriate team with priority
Extract clauses, deadlines, parties, obligations Compare against templates
After deploying to 50 beta users:
Emails processed: 12,450
Assets extracted: 3,127
Average accuracy: 98.7%
Average processing time: 2.4 seconds (100 emails)
User satisfaction: 4.8/5.0
Cost per user/month: ₹24
User feedback:
"I discovered 3 bank accounts I forgot I had!" - User A
"Finally, all my assets in one place. Game changer." - User B
Reflect On the challenges and solutions we came across during email classification tool.
Solution: Comprehensive prompt with edge cases + GPT-4's contextual understanding
Solution: Batch processing + exponential backoff retry logic
Solution: Hybrid regex+AI approach + GPT-3.5-turbo
Solution: Hash email content + check before processing
Solution: Show confidence scores + allow manual verification
Real-time Processing: WebSocket notifications when new assets detected
Asset Valuation: Integrate with market APIs for real-time valuations
Portfolio Analytics: Show asset allocation, diversification insights
Estate Planning: Generate reports for legal/tax purposes
Multi-language Support: Process emails in any language
Mobile App: React Native app for on-the-go access
AI Recommendations: Suggest portfolio optimization strategies
AI > Regex for unstructured data (99% vs 74% accuracy)
Batch processing is crucial for API-heavy operations
Prompt engineering is an art - invest time in crafting perfect prompts
Cost optimization matters - hybrid approaches save money
User experience trumps everything - 3s vs 90s makes or breaks adoption
Solve real problems - email chaos is universal
Start with MVP - regex first, then AI
Measure everything - accuracy, speed, cost
Security is non-negotiable - especially with financial data
Scale thoughtfully - architecture decisions compound
Get to know more about our email classification tool from where we get ideation and suggestions.
Backend: Java 17, Spring Boot 3.x
Database: PostgreSQL 15
AI: OpenAI GPT-4 / GPT-3.5-turbo
Email: Gmail API (OAuth 2.0)
Processing: Java ExecutorService (multithreading)
Caching: Spring Cache (Redis-ready)
Security: AES-256 encryption, JWT authentication
Want to build something similar? Here are helpful resources:
OpenAI API Docs: https://platform.openai.com/docs
Gmail API Guide: https://developers.google.com/gmail/api
Spring Async Guide: https://spring.io/guides/gs/async-method
Prompt Engineering: https://www.promptingguide.ai
Developing an AI powered email classification tool for financial assets made us understand that the future of the software is smart automation. We are transitioning from hard-systems to soft AI that makes use of context and nuance.
The results speak for themselves:
25% accuracy improvement (74% → 99%)
30x speed improvement (90s → 3s)
Scalable architecture
Real-world impact for users
Financial tools, HR systems, legal technology, customer support platforms: AI-driven extraction is the future.
The best part? This is just the beginning. With the advancements in AI models, the accuracy and abilities of such email classification systems will increase.
Happy building!

Tarun Kumar
(Author)
Software Engineer
Contact us
Send us a message, and we'll promptly discuss your project with you.