Skip to main content
When you a parse job is initiated, LendPathway runs a multi-step pipeline that turns raw PDFs into structured, analyzed financial data. This page explains exactly what happens at each step.

The Pipeline

Every parse follows this sequence:

1. Load Documents

All uploaded files are pulled from storage. Only PDF files are accepted — any other file type (images, spreadsheets, Word docs, etc.) is immediately marked as failed.

2. Classify Each Document

Each PDF is sent to an AI model that reads the document and identifies what kind of financial document it is. Classification runs in parallel — all PDFs are classified at the same time. The classifier can identify these document types:
TypeWhat it looks for
Bank StatementTransactions, deposits, withdrawals, account balances
Credit ReportExperian / Equifax / TransUnion, credit scores, tradelines
Tax FormIRS forms (1040, 1065, 1120, 1120-S, Schedule C, Schedule E, K-1, etc.)
Loan ApplicationApplication form for financing
ReceiptPurchase receipt, business expense
DecisionLogicReport from decisionlogic.com
AR ReportAccounts receivable report
Photo IDGovernment-issued ID (driver’s license, passport)
Voided CheckBank check showing account details
Documents that don’t match any type are marked Unsupported.
Of these, four have full parsing pipelines: Bank Statement, Credit Report, Tax Form, and Loan Application. The others (Receipt, Photo ID, Voided Check, etc.) are classified and stored but not parsed further.

3. Run Type-Specific Pipelines

Based on classification, LendPathway runs the appropriate parser for each document type — in parallel. If you upload a mix of bank statements, a credit report, and a loan application, all three pipelines run simultaneously.
Screenshot2026 03 08at5 16 16AM

Bank Statement Pipeline

The bank statement pipeline is the most complex. Here’s what happens inside it, step by step.

Step 1 — Account Metadata Extraction

The AI reads all bank statement PDFs together in a single call and extracts:
  • Business identity — business name, address, phone, tax ID
  • Principals — owner names, roles, addresses, phone numbers
  • Account ledgers — every distinct bank account across all documents (account name, account number, bank name, routing number)
Each account is assigned a unique ID. This step establishes the map of accounts that the rest of the pipeline uses. If no bank accounts are found at all, the parse fails here. After this step, two things happen immediately:
  • The book name is updated to the extracted business name
  • AI Deep Research is kicked off in the background (more on this below)

Step 2 — Statement Metadata Extraction

For each individual PDF (in parallel), the AI extracts statement-level metadata:
  • Statement start and end dates
  • Starting and ending balances, per account
  • Which accounts appear in this specific document
This is also where LendPathway figures out the date range for the book (e.g. “3 accounts, Jan 2024 to Dec 2024”).

Step 3 — Duplicate Detection

Only runs when there are 2 or more bank statement PDFs. The AI compares all statements and identifies redundant documents — complete duplicates or documents that are subsets of another (e.g. someone uploaded both a full 3-page statement and a 1-page summary of the same month). Redundant documents are marked as failed and removed before transaction extraction, preventing double-counted data.

Step 4 — Transaction Extraction

For each remaining statement (in parallel), the AI extracts every individual transaction:
  • Date
  • Description
  • Amount
  • Type (credit or debit)
This is the most computationally intensive step. If a document is too large for a single extraction call (hits token limits), LendPathway automatically falls back to chunked extraction — pulling transactions in batches of ~100 at a time, up to 20 chunks, and merging the results. Each chunk receives context about where the previous chunk left off to avoid gaps.

Step 5 — Assembly

The extracted metadata and transactions are merged together into ledgers. A ledger is one account’s data within one statement document — its starting balance, ending balance, and list of transactions. A single PDF can produce multiple ledgers if it contains multiple accounts.

Step 6 — Reconciliation

After assembly, LendPathway reconciles each ledger independently (all ledgers in parallel). This is the mathematical verification step. The formula: Starting Balance + Sum of All Credits − Sum of All Debits = Computed Ending Balance The computed ending balance is compared against the ending balance printed on the statement. If they match, the ledger is reconciled — meaning the extracted transactions are a mathematically faithful representation of the bank’s own records. If it doesn’t reconcile on the first check, LendPathway enters a retry loop (up to 3 attempts). On each attempt, the AI receives:
  • The original PDF (ground truth)
  • The current list of extracted transactions as a CSV
  • The current discrepancy amount and direction (too high or too low)
  • If the statement has multiple accounts, a note about which account is being reconciled
The AI compares the extracted transactions against the PDF and can make three types of corrections:
  1. Flip a transaction’s type — if a credit was mistakenly extracted as a debit (or vice versa), flip it
  2. Remove a transaction — if a duplicate or nonexistent transaction was extracted
  3. Add a missing transaction — if a transaction visible in the PDF wasn’t extracted
The AI is instructed to only make corrections it can clearly verify in the PDF. It will never fabricate transactions to force the math to work. If the extraction looks correct but the math still doesn’t add up (e.g. the bank’s own statement has an internal discrepancy), the AI gives up and explains why. After corrections are applied, the balance is rechecked. If it’s within $0.05, the ledger is reconciled. If not, the next attempt runs. After 3 failed attempts (or if the AI gives up), the ledger is marked not reconciled with an explanation of what went wrong. Reconciliation is skipped entirely if the starting or ending balance couldn’t be extracted from the statement.
Screenshot2026 03 08at4 34 52PM

Step 7 — Tagging

After reconciliation, all transactions across all ledgers are assigned a global sequential ID (1, 2, 3, …) and then tagged. Tagging runs three parallel processes simultaneously: AI Loan Tagging — The AI classifies transaction groups into debt/loan types:
TagDisplay Name
merchant_cash_advanceMerchant Cash Advance
bank_loanBank Loan
factoringFactoring
creditCredit Card
leaseLease
autoAuto Loan
mortgageMortgage
buy_now_pay_laterBuy Now Pay Later
debt_collectionDebt Collection
Each transaction can have at most one loan tag. AI Core Tagging — The AI classifies transaction groups into activity categories. The AI receives business identity and account context to make accurate calls (e.g. knowing the business name helps identify internal transfers vs external payments):
TagDisplay Name
internal_transferInternal Transfer
owner_transactionOwner Transaction
payment_processorPayment Processor
bank_feeBank Fee
bank_interestBank Interest
reversalReversal
cashCash
A transaction can have multiple core tags. Deterministic Pattern Tagging — Rule-based regex matching (no AI involved) that identifies:
TagDisplay Name
checkCheck
wireWire
peer_to_peerP2P
stop_paymentStop Payment
nsfNSF
overdraftOverdraft
NSF and overdraft tags are only applied to debits. A transaction can have multiple deterministic tags. All three tag types are then merged onto each transaction: loan tag first (if any), then core tags, then deterministic tags.

Step 8 — Position Detection

Positions are detected from the loan-tagged transactions. There are two methods depending on loan type: MCA Positions (AI-based) — For Merchant Cash Advance transactions, an AI model matches transaction groups to known funders from your org’s funder registry. Each position gets a funder name, loan type, and the set of transaction IDs that belong to it. Funders from your registry include metadata like favicon, contact info, and website. Other Loan Positions (algorithmic) — For all other loan types (Bank Loan, Factoring, Auto, Lease, Mortgage, Debt Collection, Buy Now Pay Later), positions are detected using text similarity clustering. Transaction descriptions are compared using TF-IDF (a text similarity algorithm) and grouped into clusters. Each cluster becomes a position.

Step 9 — Background Analysis

Two background tasks run during the pipeline and are collected at the end: AI Deep Research — Started immediately after account metadata extraction (Step 1). Uses the extracted business name, address, phone, and principal names to search the web and verify the business’s legitimacy. Runs in the background during the entire rest of the pipeline. The result is the “AI Deep Research” card on the Synopsis page. Tampering Analysis — Started after reconciliation (Step 6). Examines the PDF metadata of every uploaded document — producer, creator application, creation dates, modification dates — and looks for signs of fabrication or programmatic generation (e.g. all PDFs having identical metadata, timestamps that are impossibly close together, or creation tools not typically used by banks). Runs in the background during tagging and position detection. The result is the “Tampering Analysis” card on the Synopsis page. Both tasks are best-effort. If either one fails, the parse still completes normally.

Key Concepts

Book — A container for one deal or submission. A book holds one or more uploaded documents and the parsed results. When you upload files and click Parse, you’re parsing a book. Document — A single uploaded PDF file. Gets classified into a document type (bank statement, credit report, etc.) during parsing. Ledger — One bank account within one statement period. A single PDF can produce multiple ledgers if it contains data for multiple accounts. Each ledger has a starting balance, ending balance, and a list of transactions. Reconciliation happens at the ledger level — each ledger is independently verified. Account — A bank account that spans across statement periods. After parsing, LendPathway merges all ledgers for the same account into a single unified transaction history. If you upload 12 monthly statements for the same checking account, you get 12 ledgers but 1 account. Position — A detected debt relationship with a specific lender. For example, if the parser identifies regular payments to “Prime Funding LLC,” it creates a position grouping those transactions together with a funder name, loan type, total disbursed, and total paid. Tag — A label applied to a transaction that identifies what type of activity it represents. Tags are applied automatically during parsing and can be manually edited afterward. A transaction can have multiple tags (e.g. a wire payment to a lender could be tagged both “Wire” and “Merchant Cash Advance”). Reconciliation — The process of mathematically verifying that extracted transactions match the bank’s own records. Starting balance plus the sum of all transaction amounts should equal the ending balance. A reconciled ledger means the data is accurate to within $0.05 of what the bank reported.