[{"content":"A Google Apps Script that automatically cleans up your Gmail inbox. It discovers who\u0026rsquo;s flooding your inbox, lets you block them by domain, bulk-trashes junk, purges old promotions, and runs daily on autopilot.\nFeatures Discover — Scans Promotions, Updates, Social, and Spam. Ranks senders by volume. Block — Add unwanted domains to a block list. Stored in your Google account, not in the code. Clean — Bulk-trashes emails from blocked domains. Never touches Primary inbox. Automate — Daily trigger at 3am. Auto-continues via triggers if it hits execution limits. Safe — Everything goes to Trash (recoverable 30 days). Primary inbox is explicitly excluded. Tech Google Apps Script (ES5, zero dependencies) Jest test suite with 56 tests and mocked Google APIs GitHub Actions CI Auto-continuation triggers for long-running operations View on GitHub →\n","permalink":"https://rahuljoshi.dev/projects/gmail-inbox-sweeper/","summary":"Google Apps Script that automatically discovers spam senders, bulk-trashes by domain, and schedules daily cleanup. Open source.","title":"Gmail Inbox Sweeper"},{"content":"Journal of Computer Science and Technology Studies (JCSTS) Vol. 7, Issue 7, 2025, pp. 228-236 | Al-Kindi Publisher\nExamines practical challenges of implementing human-in-the-loop systems in financial services. Feature stores have become essential infrastructure that allow organizations to ensure alignment between model training and production systems while accommodating human-AI collaboration. Features designed for human use frequently vary from those optimized for algorithms — mathematical accuracy must be weighed against the interpretability requirements of analysts making justifiable choices.\nKeywords: Human-in-the-Loop Systems, Financial Services AI, Data Engineering Architecture, Explainable AI, Continuous Learning Pipelines\nRead paper → | ResearchGate →\n","permalink":"https://rahuljoshi.dev/publications/human-in-the-loop-ai/","summary":"How data engineering architecture enables human-AI collaboration in financial services — feature stores, explainability, and continuous learning pipelines.","title":"Human-in-the-Loop AI in Financial Services: Data Engineering That Enables Judgment at Scale"},{"content":"International Journal of Computing and Engineering (IJCE) Vol. 7, Issue 7, 2025, pp. 25-38 | CARI Journals\nThis paper addresses integrating artificial intelligence into banking systems through robust data engineering. Effective AI deployment depends on a strong data engineering framework capable of meeting the intricate demands of regulated financial settings. Key components include multi-zone architectures, advanced feature versioning, comprehensive governance frameworks tracking model lineage, and real-time processing capabilities. Successful implementation requires balancing reproducibility, explainability, security, and regulatory compliance across large-scale operations.\nKeywords: AI-Ready Banking Platforms, Multi-Zone Data Architecture, Feature Engineering Systems, Data Governance Frameworks, Real-Time Stream Processing\nRead paper →\n","permalink":"https://rahuljoshi.dev/publications/making-banking-platforms-ai-ready/","summary":"How effective AI deployment in banking depends on a robust data engineering framework — multi-zone architectures, feature versioning, governance, and real-time processing.","title":"Making Banking Platforms AI-Ready: The Data Engineering Foundation"},{"content":"Sarcouncil Journal of Engineering and Computer Sciences (SJECS) 2025, pp. 232-240 | SARC Publisher | ISSN: 2945-3585\nExamines the paradigm shift from monolithic enterprise data platforms to composable data architectures. Decoupled systems with standardized interfaces enable technological flexibility alongside robust governance. Implementation strategies discussed include the strangler pattern and domain-based decomposition, while addressing organizational obstacles to transition.\nKeywords: Composable Data Architecture, Metadata Orchestration, Technical Debt Reduction, Domain-Aligned Data Products, Governance Effectiveness\nRead paper →\n","permalink":"https://rahuljoshi.dev/publications/composable-data-architectures/","summary":"The paradigm shift from monolithic enterprise data platforms to composable architectures with decoupled systems and standardized interfaces.","title":"Composable Data Architectures: Moving Beyond the Monolithic Platform"},{"content":"Sarcouncil Journal of Multidisciplinary (SJMD) 2025, pp. 48-54 | SARC Publisher | ISSN: 2945-3445\nExamines the paradigm shift from model-centric to data-centric approaches in artificial intelligence. Data quality and governance determine AI system success more than model architecture. Explores architectural components including feature stores and metadata management, alongside responsible AI implementation through fairness and bias mitigation at the data layer.\nKeywords: Data-Centric AI, Feature Store Architecture, Lineage Tracking, Drift Detection, Responsible AI Implementation\nRead paper →\n","permalink":"https://rahuljoshi.dev/publications/data-centric-ai/","summary":"The paradigm shift from model-centric to data-centric AI — how data quality and governance at the platform level determine AI system success.","title":"Data-Centric AI: Engineering Platforms for Pre-Model Intelligence"},{"content":"European Modern Studies Journal (EMSJ) 2025 | Loro Journals\nExamines how the invisible hand of data engineering shapes social outcomes through seemingly technical decisions that disproportionately impact vulnerable populations. Reveals how household relationship models disadvantage non-traditional families, recovery mechanisms exacerbate inequalities during system outages, and default assumptions penalize those with limited financial histories. Engineering decisions are not merely technical — they are consequential policy choices with significant equity implications.\nKeywords: Data Engineering Ethics, Financial Inclusion, Automated Decision-Making, Systemic Equity, Platform Design\nRead paper →\n","permalink":"https://rahuljoshi.dev/publications/invisible-hand-of-data-engineering/","summary":"How data architecture choices, pipeline failures, and default values in automated decision-making create systemic barriers to financial inclusion.","title":"The Invisible Hand of Data Engineering: How Platform Decisions Impact People's Lives"},{"content":"Capital One Tech Blog\nAn analysis of the convergence between Delta Lake and Apache Iceberg — two leading open table formats for lakehouse architecture. Covers interoperability, feature parity, and what the convergence trend means for enterprises choosing a lakehouse strategy.\nRead on Capital One Tech Blog →\n","permalink":"https://rahuljoshi.dev/publications/lakehouse-convergence/","summary":"Analysis of format convergence between Delta Lake and Apache Iceberg — what it means for lakehouse architecture.","title":"Lakehouse Convergence: Delta Lake \u0026 Iceberg"},{"content":"Capital One Tech Blog\nA deep technical exploration of how Delta Lake implements ACID transactions on top of cloud object storage through its transaction log (DeltaLog). Covers the write-ahead log structure, optimistic concurrency control, checkpoint mechanisms, and how these enable reliable data operations at scale.\nRead on Capital One Tech Blog →\n","permalink":"https://rahuljoshi.dev/publications/delta-lake-transaction-logs/","summary":"Deep dive into how Delta Lake implements ACID transactions on cloud object storage through its transaction log.","title":"Delta Lake Transaction Logs Explained"},{"content":"Capital One Tech Blog\nTraces the architectural evolution of data lakes from early Hadoop-based implementations through modern lakehouse platforms. Covers the key inflection points — the shift from HDFS to cloud object storage, the emergence of open table formats, and the convergence of data warehousing and data lake patterns.\nRead on Capital One Tech Blog →\n","permalink":"https://rahuljoshi.dev/publications/evolution-of-data-lakes/","summary":"From Hadoop to modern lakehouse — tracing the architectural evolution of enterprise data lakes.","title":"Understanding the Evolution of Data Lakes"},{"content":"Pattern Recognition and Machine Intelligence (PReMI 2011) Springer, Lecture Notes in Computer Science, Vol. 6744, 2011\nAuthors: Maunendra Sankar Desarkar, Rahul Joshi, Sudeshna Sarkar Affiliation: Department of Computer Science \u0026amp; Engineering, IIT Kharagpur\nProposes a displacement-based variant of the Kendall-Tau distance metric for unsupervised evaluation of rank aggregation algorithms. The metric considers rank position differences, enabling evaluation without ground truth rankings — applicable to metasearch engines and preference aggregation systems.\nKeywords: Rank Aggregation, Kendall-Tau Distance, Unsupervised Evaluation, Information Retrieval\nRead on Springer →\n","permalink":"https://rahuljoshi.dev/publications/rank-aggregation-springer/","summary":"A variant of Kendall-Tau distance metric for unsupervised evaluation of rank aggregation — Springer PReMI 2011.","title":"Displacement Based Unsupervised Metric for Evaluating Rank Aggregation"},{"content":"Overview Distinguished Data Engineer and Director at Capital One — an elite technical leadership role held by fewer than 0.5% of associates. 19+ years of experience architecting modern cloud-native data platforms that power intelligent products, decisioning systems, and analytics at enterprise scale.\nM.Tech from the Indian Institute of Technology (IIT) Kharagpur (GPA: 9.11/10). Fellow of the British Computer Society. Senior Member, IEEE. Fellow, IETE. Forbes Technology Council Member.\nDownload CV (PDF)\nCareer Capital One | Distinguished Data Engineer \u0026amp; Director | 2022–Present Leading data architecture across Capital One\u0026rsquo;s most critical platforms — from Enterprise Data Tech to Card Tech. Architected the ML Zone for production-scale model training, drove $10M+ in annual cloud cost savings, and led architecture of the enterprise-scale data lake (300+ PB) and multi-tenant cloud warehouse (50+ PB). Represents Card Tech in Capital One\u0026rsquo;s Distinguished Engineer community, shaping enterprise-wide platform direction.\nEY | Senior Manager, Data \u0026amp; Analytics, Financial Services | 2019–2022 Led high-impact data architecture engagements across Banking, Capital Markets, Wealth Management, and Insurance. Architected cloud-based enterprise data lakes and analytics hubs on AWS and Snowflake for major U.S. banks. Designed a graph-based resiliency framework for a global investment bank using Neo4j. Led architecture reviews presented at board level.\nIBM | Sr. Managing Consultant, Big Data \u0026amp; Analytics | 2015–2019 Architected hybrid cloud data lakes integrating 100+ source systems and 5B+ daily records for a major U.S. insurer. Designed a Kafka-based transport platform handling 4B+ daily events. Built Spark/MapReduce pipelines supporting 10,000+ daily workloads. Deployed predictive models that automated 1.5M+ IVR actions and reduced call volumes.\nPersistent Systems | Software Engineer → Architect | 2007–2015 Led architecture of next-gen hybrid data lake platforms in partnership with IBM. Ported Apache Hadoop to Windows, enabling the launch of Microsoft HDInsight on Azure. Engineered Microsoft\u0026rsquo;s ODBC Driver for Apache Hive. Built bi-directional Sqoop connectors for SQL Server / PDW integration. Built Persistent\u0026rsquo;s Big Data Analytics Library (PEBAL).\nNVIDIA | Software QA Engineer Trainee | 2006–2007 Quality assurance and validation of GPU device drivers and media decoders. DirectX and OpenGL compliance testing across hardware configurations and rendering pipelines.\nResearch \u0026amp; Publications Five peer-reviewed journal articles published in 2025, with work spanning data-centric AI, composable architectures, human-in-the-loop systems, and the ethics of data engineering.\nPeer-Reviewed Journal Articles (2025):\nMaking Banking Platforms AI-Ready: The Data Engineering Foundation — International Journal of Computing and Engineering, Vol. 7, Issue 7, pp. 25-38 Human-in-the-Loop AI in Financial Services: Data Engineering That Enables Judgment at Scale — Journal of Computer Science and Technology Studies, Vol. 7, Issue 7, pp. 228-236 Composable Data Architectures: Moving Beyond the Monolithic Platform — Sarcouncil Journal of Engineering and Computer Sciences, pp. 232-240 Data-Centric AI: Engineering Platforms for Pre-Model Intelligence — Sarcouncil Journal of Multidisciplinary, pp. 48-54 The Invisible Hand of Data Engineering: How Platform Decisions Impact People\u0026rsquo;s Lives — European Modern Studies Journal Conference Paper:\nDisplacement Based Unsupervised Metric for Evaluating Rank Aggregation — Springer PReMI 2011. Co-authored with M.S. Desarkar and S. Sarkar, IIT Kharagpur. Capital One Tech Blog:\nDelta Lake Transaction Logs Explained Lakehouse Convergence: Delta Lake \u0026amp; Iceberg Understanding the Evolution of Data Lakes LinkedIn Articles — Databricks Data + AI Summit 2025:\nThe Future of Data Platforms is Federated, Governed, and AI Ready — Day 1 The Data Intelligence Platform Goes Operational, App-Ready, and Agent-Aware — Day 2 The Lakehouse Is Evolving: Spark Declarative Pipelines, Efficient Interoperable Tables, and a Unified User Experience — Day 3 Professional Memberships Fellow, British Computer Society (BCS) Fellow, Institution of Electronics and Telecommunication Engineers (IETE) Senior Member, IEEE Member, Forbes Technology Council Full Member, Sigma Xi — The Scientific Research Honor Society Awards \u0026amp; Recognition Peer Reviewer, IEEE Conferences Judge, Globee Awards for Technology (2025) Service Excellence Award, IBM (2016) Eminence and Excellence Award, IBM (2015) Star Performer, Persistent Systems (2012) Best Outgoing Student, Computer Engineering Department (2006) Certifications AWS Certified Solutions Architect – Associate AWS Certified Cloud Practitioner SnowPro Core Certification (Snowflake) Enterprise Design Thinking Practitioner, IBM Education Indian Institute of Technology (IIT) Kharagpur | M.Tech, Computer Science \u0026amp; Engineering | 2009–2011 GPA: 9.11/10. Research in information retrieval, machine learning, and rank aggregation.\nPune University | B.E., Computer Engineering | 2002–2006 First Class with Distinction. Best Outgoing Student.\nConnect GitHub LinkedIn Email: visit2rahul@gmail.com ","permalink":"https://rahuljoshi.dev/about/","summary":"Rahul Joshi — Distinguished Data Engineer, researcher, and platform architect.","title":"About"}]