Merge Pipeline Technical Architecture
At a high level, our integration pipeline is responsible for fetching raw data from an institution's SIS and serializing it into a format Coursedog understands to store on our database; similarly, our integration pipeline is also capable of taking data from Coursedog and sending updates back to an institution's SIS. In this way, our integration pipeline is bidirectional — that is, data between Coursedog and an SIS is kept in sync, regardless of which system the data is first updated on.
Our integrations pipeline consists of three distinct components, each in charge of their own area of responsibility
- Core Pipeline — Our data-agnostic pipeline in charge of fetching, validating, merging, and persisting data.
- SIS/School API Serializers — Serializers which transform SIS data to Coursedog data and vice versa. Called by the Core Pipeline when formatting data received from or persisting to the SIS.
- Job Runners — Exposes functionality and is responsible for scheduling, sandboxing and running integrations. Ultimately calls and runs the core integration pipeline.
In summary, our integration pipeline is responsible for fetching raw data from an institution's SIS and serializing it into a format Coursedog understands to store on our database; similarly, our integration pipeline is also capable of taking data from Coursedog and sending updates back to an institution's SIS. In this way, our integration pipeline is bidirectional. It is capable of ensuring Coursedog and an SIS is kept in sync, regardless of which system the data is first updated on.
Let's look at each of these components in more detail.
Core Pipeline
Simple Graphic of Pipeline
The core pipeline is in charge of getting data from the SIS to Coursedog and vice versa. Nothing in the core pipeline is school-specific, and the core pipeline makes no blanket assumptions about the data its operating on (other than that it's JSON), making it very flexible. Ultimately, the core pipeline is in charge of how to MOVE data from destination A to destination B and does NOT care how that data is structured (that's the serializer's job)
There are five distinct steps to the integration pipeline
- Fetching Data from the SIS and Coursedog
- Verifying Data received from the SIS and Coursedog
- Merging Data in to one in-sync copy
- Persisting the changed data back to Coursedog and the SIS (optional). Not all integrations push data back to the SIS, and while it's not a hard requirement of the pipeline, in practice all integrations do persist SIS data to Coursedog
- Generate, persist, and return a merge report documenting the results of each step and any error
Some auxiliary yet other essential general tasks are performed in this core pipeline, too, such as
- processing provided Merge Settings, which specify which steps to run (I.E. sometimes a school may want to test merging data without persisting anything)
- performing a backup of the current state of the SIS and Coursedog before anything gets persisted
School API Serializers ("Formatters")
Since nothing in the Core Pipeline is school-specific, and SIS-platforms each have their own representation of data (and some schools have their own idiosyncrasies or differing conventions on top of that), it's important that we be able to easily specify how to shape data when we're receiving from or persisting to an institution. Each serializer is concerned with how to TRANSFORM data from one format to another
At a high level, each formatter is a serializer that maps data from a SIS's data model to Coursedog's data model or vice versa (and, using Coursedog's data model as the common ground, even from one SIS data model to another SIS data model; you get the idea)
Job Runners
Job Runners are responsible for establishing the environment to ultimately call the integrations within. There are 3 general contexts integrations are run within:
- Nightly — Runs all of a school's enabled integrations every night
- Realtime — Runs an integration for just a specific Coursedog entity when it's changed (e.g. when a section is edited)
- Manual — Performs an integration on demand
All these contexts run in their own sandboxed Bull.js queue, and each runner provides its own methods around scheduling or running an integration, along with what to do when an integration completes, succeeds, or fails. Coursedog has predefined these three types of runners to use at various points in the implementation and application life cycle in accordance with the school's requirements and capabilities.