Batching
Process large datasets by splitting them into configurable batches with deduplication, filtering, and automatic result aggregation.
How Batching Works
When a workflow step receives an array of items (leads, investors, URLs), batching splits it into smaller chunks, processes each batch independently, then aggregates the results. This prevents timeouts, manages API rate limits, and enables parallel processing.
The pipeline: Item Resolution → Batch Creation → Parallel Execution → Aggregation.
Configuration
Enable batching on any step by adding a batchConfig object:
{
"batchConfig": {
"enabled": true,
"size": 25,
"sourceVariable": "leads",
"idField": "email",
"qualifiedOnly": false,
"maxItems": 500
}
}| Field | Type | Description |
|---|---|---|
| size | number | Items per batch (default: 25) |
| sourceVariable | string | Variable name containing the array to batch |
| idField | string | Field used for deduplication (e.g., "email") |
| qualifiedOnly | boolean | Filter to only items that passed prior scoring |
| maxItems | number | Maximum total items to process |
Deduplication
When an idField is set, batching deduplicates items before processing. If duplicates exist, the item with the highest score is kept. This prevents re-processing the same lead or entity across batches.
Result Aggregation
After all batches complete, results are merged and metrics are computed automatically:
- •Totals — total items processed, passed, failed
- •Averages — mean score, median score
- •Tier counts — distribution across score tiers (high, medium, low)
- •Score distribution — histogram of scores across all items
Example: Scoring 2,000 Investors
An investor matching workflow processes a database of 2,000 investors with batching:
Step: "Score Investors"
batchConfig:
size: 50
sourceVariable: "investors"
idField: "investor_id"
maxItems: 2000
Execution:
40 batches of 50 investors each
Deduplication removed 23 duplicates
Processing time: 4m 12s
Aggregated results:
Total scored: 1,977
High tier (8+): 312 (15.8%)
Medium tier (5-7): 891 (45.1%)
Low tier (<5): 774 (39.1%)
Mean score: 5.7Edge Cases
- •Empty arrays — Batching completes immediately with zero-count metrics
- •Failed batches — Individual batch failures don't halt the entire run; failed items are reported in aggregated results
- •Over maxItems — Items beyond the limit are silently dropped before batch creation
Next Steps
- AI Steps — Configure AI reasoning within batched steps
- Scoring & Feedback — How batch scores feed back into future runs
- Loops & Conditionals — Control flow for complex workflows
