
We are joined by Konstantin Berezin, a seasoned software developer currently working as a Back End Developer at Rapyd in Dubai. With extensive experience across multiple fintech companies including Sberbank, Master Delivery, and Rostelecom, Konstantin brings unique insights into the evolution of machine learning applications in financial technology. His notable work at Sberbank involved developing innovative mortgage approval systems and automated encumbrance removal processes that significantly improved operational efficiency and client experience.
How did you approach building NLP systems for mortgage rule interpretation before modern LLMs were available?
Prior to the availability of large language models, we were using a combination of rule-based and traditional NLP. I developed a mechanism to use regular expressions, tokenization, and dependency parsing for pattern recognition in linguistic legal rules and mortgage regulations. Russian legal language was particularly challenging because of the formal grammar and very high occurrence of subordinate clauses, so we had to create domain-specific dictionaries for normalizing vocabulary and synonym resolution typical in Russian legal documents.
The pipeline itself was modular and three-stage: rule extraction, semantic validation, and text preprocessing. It was deterministic, not nearly as fluent-sounding as LLMs are today, and targeted at the regulatory needs of the banking industry. We could trace precisely why a rule was being interpreted in a particular way, something regulators needed as an audit trail.
What machine learning models did you implement for the primary mortgage scoring system at Sberbank?
We employed gradient boosting decision trees, XGBoost and CatBoost, the latter specifically useful in handling Russian categorical data. CatBoost naturally handles categorical features without heavy preprocessing, and it proved to be useful since Russian banking data have plenty of categorical variables such as employment sector codes and regional classification.
We combined bureau credit scores, salary account history, repayment history, and digital channel behavior data from Sberbank in feature engineering. Surprise: the behavior data was very predictive: digitally banked product-active customers defaulted less. Models performed around 15% better on approval rate accuracy relative to logistic regression baselines without sacrificing explainability standards using SHAP values and feature importance analysis.
Can you explain how your automation framework reduced developer time by 30% per sprint?
I implemented a reusable automation platform for data ingestion, preprocessing, and model validation. It unified ETL pipelines and test harnesses, and data validation or model monitoring scripts no longer needed to be replicated every sprint by developers. Custom scripts used to be associated with each new feature before, and duplicated effort and inconsistencies resulted from it.
Automated data consistency checks and unit tests minimized rework that resulted when data quality issues were found late in the development process. The net result was a measurable 30% saving of developer time per sprint since teams could concentrate on model improvements instead of infrastructure. The framework also decreased new developer ramp-up time that was able to take advantage of existing tools.
How do you compare your custom NLP implementation with today’s large language models for financial applications?
Our own NLP engine was deterministic, expert, and traceable, perfect for compliance but not overly flexible. If auditors wanted to know how a particular rule was being invoked, we could provide a full trace. The behavior of the system was deterministic, and that was something that compliance environments called for, but accommodating new types of documents meant rules and dictionaries were adjusted manually.
Modern-day LLMs are much better at processing unstructured text, but require tuning, guardrails, and multiple layers of explainability prior to production deployment in finance. Guardrails are necessary to stop hallucinations from being generated while explainability mechanisms are required in order to satisfy regulatory needs. Bottom line: old systems were static but safe; new LLMs are dynamic but must be tightly controlled in order to be production-ready.
What made your encumbrance removal system innovative in the Russian fintech market?
Then, like today, the most of post-loan repayment encumbrance removals were done manually by paper requests to registers, which would take months or weeks. I built an automated system that would make direct interactions with government APIs (Rosreestr) to verify repayment, generate necessary legal filings, and send them electronically.
This brought processing time from weeks to hours, fewer errors, and provided a market-first evidence of automated encumbrance release. There were extraordinary business benefits: enhanced customer satisfaction, reduced call center volume, lower operational cost, and fewer legal errors. It was a competitive advantage when competing banks were doing it manually.
How did you ensure ML model compliance and explainability in a heavily regulated banking environment?
We operated with three principles: First, models needed to be interpretable and therefore we employed SHAP values and decision tree visualization for auditors. We educated non-technical stakeholders so that they could interpret the visualizations. Second, all scoring work and training were documented within an immutable audit trail. Every run of model training and each score decision was captured with full lineage in append-only systems.
Third, features were scrutinized for compliance with regulations. Automated audits pointed out features based on forbidden data, and we conducted frequent bias checks to ensure that models were not biased against protected characteristics. Repeated model checking and stress testing ensured that models complied with the standards of the central bank under different economic conditions.
What were the key technical challenges in scaling your system to process over 1,000 mortgage applications daily?
The most important issues were latency and concurrency. We used asynchronous processing and distributed queues (RabbitMQ) to deal with spikes in traffic during lunchtimes and evenings, pre-cached database indexes for quicker credit history lookups, and cached bureau searches where regulations permitted.
We also faced the challenge of making feature generation pipelines efficient and stable. We addressed this through a feature store that ensured consistency between batch and real-time scoring, avoiding bugs where models behaved differently in production than during validation. We also added end-to-end monitoring to catch data distribution anomalies and latency, with retry and circuit breaking to manage external dependency failure.
How do ML approaches differ between Russian fintech and the global markets you work in now?
Russian fintech has traditionally focused on affordability, regulatory compliance, and developing home-grown technologies because Western systems had no access. We developed our own data centers and ML platforms at Sberbank. The Central Bank of Russia outlined regulations on credit scoring models, which while giving regulatory certainty cut innovation space.
Cloud-native ML pipelines, MLOps platforms such as AWS SageMaker, and pre-trained models are used more in cross-border markets. Compliance frameworks differ as well: in Russia, the central bank strictly imposed feature limitations, while abroad GDPR and CCPA foster increased emphasis on data privacy and fairness audits. The innovation pace is higher in cross-border markets, though conservatism in Russian banks at times skips over reliability issues encountered in aggressive cross-border rollouts.
What metrics did you use to measure the business impact of your ML implementations at Sberbank?
At Sberbank, I defined business impact mainly through operational efficiency and customer outcomes. The mortgage pre-approval process shaved average application processing time by approximately 20%, which dramatically enhanced customer experience and conversion rates. Mortgage approvals increased from approximately 1,000 to 1,150 per day, a 15% increase in daily approvals, with steady default rates.
The automation infrastructure freed up developers’ daily work efforts by around 30% per sprint, speeding up new feature delivery. I also monitored technical quality metrics such as model prediction latency, system uptime, and data quality scores. They showed obvious business value by increasing throughput, delivering customer satisfaction, and cutting operating expenses.
Where do you see the most promising opportunities for AI in financial services over the next few years?
The greatest potential is within personalized financial products, fraud detection, and process automation. Real-time AI capabilities can examine customer activity across every banking channel to present exactly targeted credit offers and financial advice exactly when customers are in need. In fraud detection, graph neural networks and anomaly detection can detect sophisticated fraud rings and novel fraud patterns without examples of each type of fraud that have been labeled. Generative AI is also revolutionizing back-office labor (writing regulatory reports, producing compliance documents, and fulfilling customer support requests,) enhancing human capacity instead of supplanting it.
No less vital are interpretable AI architectures for regulators as well as cross-border fintech optimization. Regulators require architectures in which LLMs and advanced models may be audited, the reason behind their decisions explained, and their behavior certified compliant with the rules. Building these frameworks with formal verification and explainability techniques will allow for more aggressive AI deployments in regulated spaces. Meanwhile, AI can best optimize foreign exchange transactions, remittance corridors, and payment networks across international jurisdictions, something that is highly relevant to what I’m currently doing at Rapyd as we construct global payment infrastructure.