← Back to blog
Product

Migrating from Docparser, Nanonets, or Rossum: a practical guide

We get this question constantly: _we're already running [Docparser / Nanonets / Rossum / Mindee / Azure Form Recognizer / AWS Textract] — what's the actual migration look like?_

Honest answer: easier than rebuilding a template library, harder than flipping a switch. This post is the playbook we share with prospects who want to validate Docusift against an existing extractor without committing first.

What you keep

Most of the integration work you've already done transfers as-is. Specifically:

- Your downstream pipeline. The accounting sync (QuickBooks, Xero), the database writes, the alerting on failed extractions — none of it cares which extractor produced the JSON, only the shape of the JSON. - Your reviewer workflow. Whatever tool your ops team uses to fix mis-extractions still works. Docusift's Review queue is one option, but if you've built your own review UI we don't fight it. - Your historical data. No re-extraction needed. Past invoices stay where they are.

What changes

Three things, in roughly increasing order of effort:

1. The API call

Replace your existing extractor's upload call with POST /api/v1/extract:

curl -X POST https://docusift.co/api/v1/extract \
  -H "X-API-Key: ds_live_…" \
  -F "file=@invoice.pdf"

Response:

{
  "data": [
    {
      "id": "doc_01HZ…",
      "file_name": "invoice.pdf",
      "status": "uploaded"
    }
  ],
  "errors": []
}

Then either poll GET /api/v1/extract/:id or register a webhook and we push the result on document.processed. That's the surface area.

2. The field mapping

This is where most migration time actually goes. Different extractors return slightly different field names. A migration script that translates the old vendor's field names to Docusift's is usually 100-200 lines:

function mapToDocusiftSchema(legacy: LegacyResult): DocusiftFields {
  return {
    invoice_number: legacy.invoiceNo ?? legacy.invoice_number ?? legacy.docNum,
    vendor_name: legacy.supplier ?? legacy.vendor,
    total_amount: parseFloat(legacy.total ?? legacy.amount ?? '0'),
    currency: legacy.currency ?? 'USD',
    line_items: legacy.lineItems?.map(/* … */) ?? [],
  };
}

Run this script once over a sample of historical extractions to confirm the mapping. Then run live traffic through both extractors in parallel for a week and diff the outputs. Where they disagree, hand-audit a sample to see which one was right — usually it's the model-based one, but not always.

3. The threshold tuning

This is the conceptual shift, not the technical one. Template-based tools either succeed or fail per field. There's no "the model wasn't sure." Docusift gives you a confidence score per extraction and expects you to set an auto-approve threshold.

Realistic settings during migration:

- Week 1: threshold 0.99. Almost everything goes to Review. You're not trusting the model yet, you're calibrating. - Week 2: threshold 0.95. Auto-approve the high-confidence cases. Watch which docs land in review — they should be the ones that actually look ambiguous. - Week 3: threshold 0.92 (default). Most teams settle here within a month.

If you're tempted to set the threshold lower than 0.88, take a hard look at what's landing in Review first. Usually it's a real input-quality issue (poor scans, a vendor with weird formatting) that the model is correctly flagging.

The one-week validation plan

If you want to gut-check Docusift before any commitment, here's the plan we hand prospects:

Day 1. Sign up. Mint an API key under Settings → API Keys. Free tier gives you 100 pages — enough for the audit.

Day 2. Pick 30-50 representative documents — a mix of vendors, layouts, and quality. Run them through both your existing extractor and POST /api/v1/extract. Save the JSON from each.

Day 3. Diff the outputs. For each field that differs, eyeball the original PDF and decide which extractor was right. Score it.

Day 4. Repeat with 20-30 _edge case_ documents — bad scans, hand-written invoices, vendors you've never seen. This is where template-based tools usually fall over and template-free ones earn their keep.

Day 5. Look at the confidence scores Docusift returned. For the documents where it was right, what was the confidence? For the ones it was wrong, was the confidence lower? You're calibrating your threshold.

Day 6-7. Buffer for re-runs and any deeper investigation. By end of week you have a real number — empirical accuracy on _your_ documents — to base a decision on.

Common migration questions

"What about line items?" Both Docparser and Nanonets struggled with multi-page line items historically. Docusift handles them natively (line_items is a top-level array in the extracted JSON). For documents with complex line-item tables, we usually outperform the template-based competitors by a wide margin.

"What about my custom fields?" If you've trained a custom field on Nanonets or Rossum, the equivalent in Docusift is a per-workspace prompt extension we set up at onboarding (paid plans only). Most "custom fields" turn out to be a recognized field in the Docusift schema we just hadn't surfaced.

"What about cost?" Per-page pricing is competitive at every tier. The bigger savings is the reduction in template maintenance — most customers report cutting ops headcount on the extraction-curation team by 30-50%.

Want help running the validation?

The free extraction audit is exactly the day-2-and-3 step automated. Upload up to 10 sample documents and we send back the JSON we'd extract along with confidence scores. Use it to compare against your existing tool before you invest a week.

We've done this migration with customers coming off all five major template-based tools. Reach out at hello@docusift.co and we'll walk through your specific document mix.