In the fast-paced world of travel technology, where data from hundreds of suppliers needs to be ingested and updated constantly, performance and reliability become mission-critical. Many companies rely on Tourplan, a leading travel management system, to handle supplier content via XML import functionalities. However, as the size of supplier catalogs has grown—sometimes reaching hundreds of thousands of products—the traditional XML import process began to fail. This bottleneck prompted developers to implement a smarter, chunked import pipeline to maintain system stability and performance.
TL;DR
When importing large supplier catalogs into Tourplan via XML, many companies faced repeated failures and timeout errors. The issue stemmed from the system’s inability to handle massive XML payloads in a single transaction. Introducing a chunked import pipeline, where data is broken into smaller, manageable parts, allowed successful processing without timeouts. This solution enhanced both reliability and scalability for Tourplan users handling large volumes of supplier data.
The Challenge with Large-Scale XML Imports
At its core, the XML data import function in Tourplan was designed to facilitate the automation of supplier content ingestion. Suppliers would generate XML files containing massive datasets—hotels, excursions, pricing structures, seasonal rules, availability calendars, and more. These files are essential for travel companies to keep inventories up-to-date.
However, many teams began encountering a critical issue: as the size of these XML files increased, the import process began to fail due to timeouts, memory pressure on the servers, or outright crashes. The root problem boiled down to a few key limitations:
- Single-threaded imports: Tourplan’s XML import processed the entire file in one go, consuming vast resources.
- No progress tracking: Once the import process started, there was no way to pause, resume, or recover gracefully from failure.
- Lack of batching mechanisms: The import pipeline assumed all data could be parsed and validated in-memory.
The result? Failed imports, corrupted datasets, and a backlog of supplier updates. Simply increasing server resources was not a viable long-term solution.
Symptoms That Indicated a Bottleneck
Several clear symptoms emerged that pointed to a systemic bottleneck in Tourplan’s import pipeline:
- Timeout errors: Import jobs would run for hours only to be terminated by server watchdog timers.
- Partial data ingestion: Teams noticed that while some records were updated, others remained outdated, resulting in inconsistent pricing and availability.
- Increased support tickets: End users began noticing anomalies in itineraries and quotes generated from outdated catalogs.
With the stakes growing higher and the catalog sizes increasing year over year, a more scalable, resilient method needed to be developed.
The Solution: A Chunked Import Pipeline
Rethinking the import architecture led to a critical realization—if the data could be processed in stages, or chunks, the risk of timeouts and performance crashes could be significantly mitigated. This idea gave birth to the new chunked import pipeline.
What is Chunking? In the context of XML data imports, chunking refers to breaking large files into smaller, logically separated segments that are processed one at a time. Each segment could represent a specific kind of data (hotels, rates, calendars) or even slices of identical data types (e.g., 5,000 hotel objects per chunk).
The chunked import pipeline followed several best practices:
- Pre-processing step: The original XML file is first analyzed and split using a parser that identifies logical breakpoints such as closing tags or predefined object groupings.
- Queue-based processing: Each resulting chunk is added to a job queue and processed asynchronously to avoid overloading system memory.
- Fail-safe checkpoints: Each chunk carries metadata for auditing and can be retried independently in case of failure.
- Progress tracking and logging: Dashboards were implemented to track the success/failure of each chunk for full transparency.
Technical Implementation Details
Let’s zoom in on how this was technically achieved.
1. Chunking Mechanism
Using an XML streaming parser (such as SAX or StAX in Java, or lxml in Python), the file was read row-by-row instead of being loaded into memory all at once. Logical nodes (e.g., <Hotel>, <Excursion>) were extracted into separate files or memory blocks as standalone documents.
2. Asynchronous Worker Queue
A job queue, powered by tools like RabbitMQ or AWS SQS, managed the submission of chunked jobs. Multiple workers could run concurrently to process chunks across different CPU cores or cluster nodes, drastically improving performance.
3. Error Handling Framework
If one chunk failed, it was logged separately and could be reprocessed without redoing the entire import. This reduced risk and shortened recovery times significantly.
Benefits Seen in Production
After rolling out the chunked import system, several travel companies observed marked improvements:
- 90% reduction in import failures: Imports that previously failed due to timeout now completed without issues.
- Faster recovery: Failed chunks could be retried instantly, allowing for more agile error correction.
- Reduced server load: Because chunks were smaller and processed asynchronously, memory and CPU usage stabilized.
- Transparency: Import logs and dashboards provided clear visibility into which data was processed and which wasn’t.
This approach proved particularly effective during peak travel seasons when supplier updates are frequent and time-sensitive. Teams could schedule nightly or hourly imports without fear of bringing down systems or generating corrupted itineraries.
Lessons Learned
This experience offered several critical lessons for ETL (Extract, Transform, Load) processes in modern travel platforms:
- Scale matters: What works for thousands of records may break at millions—systems must evolve with data volume.
- Observability is key: Logs, metrics, and dashboards should be foundational to any automated import system.
- Design for failure: Everything should be retryable, and no operation should ever assume a “perfect run.”
Future Improvements and Next Steps
While the chunked pipeline was a game-changer, innovation didn’t stop there. Several companies are now exploring:
- Real-time Supplier API integrations: Bypassing XML file dumps altogether by syncing data via REST APIs.
- Data validation at the edge: Implementing pre-import validation using XSDs and JSON-Schema to reduce garbage-in scenarios.
- Auto-scaling infrastructure: Using Kubernetes or serverless frameworks to dynamically scale the number of import workers based on job volume.
Conclusion
Data is the lifeblood of any modern travel company. As supplier ecosystems grow more complex, systems like Tourplan must evolve to handle increasingly large and frequent updates. The move to a chunked import pipeline not only solved the issue of XML import timeouts but also opened the door to a more robust, efficient, and scalable data management ecosystem.
Companies that have embraced this architecture are now processing imports faster, with greater accuracy and uptime—turning a former pain point into a competitive advantage.