When you a parse job is initiated, LendPathway runs a multi-step pipeline that turns raw PDFs into structured, analyzed financial data. This page explains exactly what happens at each step.
The Pipeline
Every parse follows this sequence:
1. Load Documents
All uploaded files are pulled from storage. Only PDF files are accepted — any other file type (images, spreadsheets, Word docs, etc.) is immediately marked as failed.
2. Classify Each Document
Each PDF is sent to an AI model that reads the document and identifies what kind of financial document it is. Classification runs in parallel — all PDFs are classified at the same time.
The classifier can identify these document types:
| Type | What it looks for |
|---|
| Bank Statement | Transactions, deposits, withdrawals, account balances |
| Credit Report | Experian / Equifax / TransUnion, credit scores, tradelines |
| Tax Form | IRS forms (1040, 1065, 1120, 1120-S, Schedule C, Schedule E, K-1, etc.) |
| Loan Application | Application form for financing |
| Receipt | Purchase receipt, business expense |
| DecisionLogic | Report from decisionlogic.com |
| AR Report | Accounts receivable report |
| Photo ID | Government-issued ID (driver’s license, passport) |
| Voided Check | Bank check showing account details |
Documents that don’t match any type are marked Unsupported.
Of these, four have full parsing pipelines: Bank Statement, Credit Report, Tax Form, and Loan Application. The others (Receipt, Photo ID, Voided Check, etc.) are classified and stored but not parsed further.
3. Run Type-Specific Pipelines
Based on classification, LendPathway runs the appropriate parser for each document type — in parallel. If you upload a mix of bank statements, a credit report, and a loan application, all three pipelines run simultaneously.
Bank Statement Pipeline
The bank statement pipeline is the most complex. Here’s what happens inside it, step by step.
The AI reads all bank statement PDFs together in a single call and extracts:
- Business identity — business name, address, phone, tax ID
- Principals — owner names, roles, addresses, phone numbers
- Account ledgers — every distinct bank account across all documents (account name, account number, bank name, routing number)
Each account is assigned a unique ID. This step establishes the map of accounts that the rest of the pipeline uses.
If no bank accounts are found at all, the parse fails here.
After this step, two things happen immediately:
- The book name is updated to the extracted business name
- AI Deep Research is kicked off in the background (more on this below)
For each individual PDF (in parallel), the AI extracts statement-level metadata:
- Statement start and end dates
- Starting and ending balances, per account
- Which accounts appear in this specific document
This is also where LendPathway figures out the date range for the book (e.g. “3 accounts, Jan 2024 to Dec 2024”).
Step 3 — Duplicate Detection
Only runs when there are 2 or more bank statement PDFs.
The AI compares all statements and identifies redundant documents — complete duplicates or documents that are subsets of another (e.g. someone uploaded both a full 3-page statement and a 1-page summary of the same month). Redundant documents are marked as failed and removed before transaction extraction, preventing double-counted data.
For each remaining statement (in parallel), the AI extracts every individual transaction:
- Date
- Description
- Amount
- Type (credit or debit)
This is the most computationally intensive step. If a document is too large for a single extraction call (hits token limits), LendPathway automatically falls back to chunked extraction — pulling transactions in batches of ~100 at a time, up to 20 chunks, and merging the results. Each chunk receives context about where the previous chunk left off to avoid gaps.
Step 5 — Assembly
The extracted metadata and transactions are merged together into ledgers. A ledger is one account’s data within one statement document — its starting balance, ending balance, and list of transactions. A single PDF can produce multiple ledgers if it contains multiple accounts.
Step 6 — Reconciliation
After assembly, LendPathway reconciles each ledger independently (all ledgers in parallel). This is the mathematical verification step.
The formula:
Starting Balance + Sum of All Credits − Sum of All Debits = Computed Ending Balance
The computed ending balance is compared against the ending balance printed on the statement. If they match, the ledger is reconciled — meaning the extracted transactions are a mathematically faithful representation of the bank’s own records.
If it doesn’t reconcile on the first check, LendPathway enters a retry loop (up to 3 attempts). On each attempt, the AI receives:
- The original PDF (ground truth)
- The current list of extracted transactions as a CSV
- The current discrepancy amount and direction (too high or too low)
- If the statement has multiple accounts, a note about which account is being reconciled
The AI compares the extracted transactions against the PDF and can make three types of corrections:
- Flip a transaction’s type — if a credit was mistakenly extracted as a debit (or vice versa), flip it
- Remove a transaction — if a duplicate or nonexistent transaction was extracted
- Add a missing transaction — if a transaction visible in the PDF wasn’t extracted
The AI is instructed to only make corrections it can clearly verify in the PDF. It will never fabricate transactions to force the math to work. If the extraction looks correct but the math still doesn’t add up (e.g. the bank’s own statement has an internal discrepancy), the AI gives up and explains why.
After corrections are applied, the balance is rechecked. If it’s within $0.05, the ledger is reconciled. If not, the next attempt runs. After 3 failed attempts (or if the AI gives up), the ledger is marked not reconciled with an explanation of what went wrong.
Reconciliation is skipped entirely if the starting or ending balance couldn’t be extracted from the statement.
Step 7 — Tagging
After reconciliation, all transactions across all ledgers are assigned a global sequential ID (1, 2, 3, …) and then tagged. Tagging runs three parallel processes simultaneously:
AI Loan Tagging — The AI classifies transaction groups into debt/loan types:
| Tag | Display Name |
|---|
| merchant_cash_advance | Merchant Cash Advance |
| bank_loan | Bank Loan |
| factoring | Factoring |
| credit | Credit Card |
| lease | Lease |
| auto | Auto Loan |
| mortgage | Mortgage |
| buy_now_pay_later | Buy Now Pay Later |
| debt_collection | Debt Collection |
Each transaction can have at most one loan tag.
AI Core Tagging — The AI classifies transaction groups into activity categories. The AI receives business identity and account context to make accurate calls (e.g. knowing the business name helps identify internal transfers vs external payments):
| Tag | Display Name |
|---|
| internal_transfer | Internal Transfer |
| owner_transaction | Owner Transaction |
| payment_processor | Payment Processor |
| bank_fee | Bank Fee |
| bank_interest | Bank Interest |
| reversal | Reversal |
| cash | Cash |
A transaction can have multiple core tags.
Deterministic Pattern Tagging — Rule-based regex matching (no AI involved) that identifies:
| Tag | Display Name |
|---|
| check | Check |
| wire | Wire |
| peer_to_peer | P2P |
| stop_payment | Stop Payment |
| nsf | NSF |
| overdraft | Overdraft |
NSF and overdraft tags are only applied to debits. A transaction can have multiple deterministic tags.
All three tag types are then merged onto each transaction: loan tag first (if any), then core tags, then deterministic tags.
Step 8 — Position Detection
Positions are detected from the loan-tagged transactions. There are two methods depending on loan type:
MCA Positions (AI-based) — For Merchant Cash Advance transactions, an AI model matches transaction groups to known funders from your org’s funder registry. Each position gets a funder name, loan type, and the set of transaction IDs that belong to it. Funders from your registry include metadata like favicon, contact info, and website.
Other Loan Positions (algorithmic) — For all other loan types (Bank Loan, Factoring, Auto, Lease, Mortgage, Debt Collection, Buy Now Pay Later), positions are detected using text similarity clustering. Transaction descriptions are compared using TF-IDF (a text similarity algorithm) and grouped into clusters. Each cluster becomes a position.
Step 9 — Background Analysis
Two background tasks run during the pipeline and are collected at the end:
AI Deep Research — Started immediately after account metadata extraction (Step 1). Uses the extracted business name, address, phone, and principal names to search the web and verify the business’s legitimacy. Runs in the background during the entire rest of the pipeline. The result is the “AI Deep Research” card on the Synopsis page.
Tampering Analysis — Started after reconciliation (Step 6). Examines the PDF metadata of every uploaded document — producer, creator application, creation dates, modification dates — and looks for signs of fabrication or programmatic generation (e.g. all PDFs having identical metadata, timestamps that are impossibly close together, or creation tools not typically used by banks). Runs in the background during tagging and position detection. The result is the “Tampering Analysis” card on the Synopsis page.
Both tasks are best-effort. If either one fails, the parse still completes normally.
Key Concepts
Book — A container for one deal or submission. A book holds one or more uploaded documents and the parsed results. When you upload files and click Parse, you’re parsing a book.
Document — A single uploaded PDF file. Gets classified into a document type (bank statement, credit report, etc.) during parsing.
Ledger — One bank account within one statement period. A single PDF can produce multiple ledgers if it contains data for multiple accounts. Each ledger has a starting balance, ending balance, and a list of transactions. Reconciliation happens at the ledger level — each ledger is independently verified.
Account — A bank account that spans across statement periods. After parsing, LendPathway merges all ledgers for the same account into a single unified transaction history. If you upload 12 monthly statements for the same checking account, you get 12 ledgers but 1 account.
Position — A detected debt relationship with a specific lender. For example, if the parser identifies regular payments to “Prime Funding LLC,” it creates a position grouping those transactions together with a funder name, loan type, total disbursed, and total paid.
Tag — A label applied to a transaction that identifies what type of activity it represents. Tags are applied automatically during parsing and can be manually edited afterward. A transaction can have multiple tags (e.g. a wire payment to a lender could be tagged both “Wire” and “Merchant Cash Advance”).
Reconciliation — The process of mathematically verifying that extracted transactions match the bank’s own records. Starting balance plus the sum of all transaction amounts should equal the ending balance. A reconciled ledger means the data is accurate to within $0.05 of what the bank reported.