Automating the Clone Sequence Analysis and Lead Identification Process
A number of us worked for years in automation and integration, and while sophisticated systems exist to automate complex tasks, orchestration between systems was often lacking. It's no small feat to get disparate software systems talking to one another - that's at the heart of every integration challenge. But leveraging that integration to solve bigger problems and to automatically move complex processes forward - that's orchestration, a benefit that too many integration projects stop just short of achieving.
At its heart, orchestration is about automating an entire process instead of just a task. The goal is to complete multiple steps in multiple software systems automatically, so that when a scientist is "looped back into the process" much of what had previously been manual, tedious, or both, is already completed. What awaits the scientist is a decision, the application of the scientist's expertise, instead of the data entry, aggregation, formatting, and analysis that computers are supposed to be handling on their behalf.
Orchestration doesn't require a large software engineering team or extensive and sophisticated tooling. It boils down to two principal requirements: (1) you can find the information you need to make a decision and (2) you can update the system with new information based on that decision*. It doesn't matter what acronym is used - REST, POX, SOAP, JSON, XML, RPC, etc - tools and frameworks exist to handle all of them. If the system allows you to search for usable data and to update specific pieces of data, you have what you need.
Making sure you can get useful and specific information is a key first step. Generally speaking, documents aren't data - if the information you need is embedded in a table in a PDF file, it’s harder to use than if it’s in an automation-friendly format like JSON, XML, CSV, or Excel. Even if the data is transmitted in a widely-used format, it needs to be searchable. If the only mechanism for finding information is knowing some unique identifier ahead of time, you can't search for the information you need. You have to search the entire database (assuming the unique IDs are knowable ahead of time) one record at a time, a very time-consuming task even for a computer.
In our Workflow Automation article we spoke at a high-level about using orchestration to automate the sequence analysis and lead identification process of a discovery campaign. In this article, we’re going to get into the details. As a refresher, we need to do the following:
- Load our reads in whatever format we receive them
- Analyze the sequences to identify their variable domains and germlines
- Identify (or present to the scientist) unique clones of interest whose assay results meet certain criteria
- Register new plates that contains the matching clones
We’re also going to register any new variable regions that we find as records in our database. You might not currently be capturing variable regions as distinct database records, but doing so creates valuable opportunities for automation. The record can be connected to all of the different formats that use it and can form the basis for generating panels of new records based on your format of choice.
I’m going to assume a few things going forward:
- You have some experience writing automation scripts in any language.
- You know what an application programming interface (API) is and how to use them.
- Every read in your sequencing results file contains plate and well information that you can extract programmatically.
Ideally your sequencing provider would let you register a webhook, but they probably just email you a file. Email could be your automation interface if you're up for it. If you designate an email address specifically for emails with sequence files attached, you can write a script that downloads the attachments, stores them in a designated place, and then uploads them to your analysis software's API.
The next step is to submit your sequence files to your sequence analysis software. StackWave Affinity provides a REST API for sequence upload, and upon upload the system will automatically analyze clone sequences. Ideally if you’re using a different solution for sequence analysis it will provide such an API as well, but at least at the time of this writing, not all sequence analysis solutions do. There may thus be a manual step required to export both your upload and obtain your results - if this is true in your case, then you'll want to include a step at the end of the previous script that emails you when the upload is complete so that you know to go get your results!
With the analysis results in hand, unique clones can be identified based on the expressed Fab fragment and new variable regions registered in our LIMS database. As mentioned above, this is an automatic next step in Affinity. Clone sequences are analyzed as soon as they are uploaded, the variable domain DNA and AA sequences are recorded in the sequence database, new variable region entities are registered, and clones are called for each paired heavy and light chain identified. If your LIMS system can’t perform these steps automatically, you’ll need to incorporate appropriate calls to other software programs or APIs that can, then use your LIMS’s API to record the results.
Next, we want to associate any screening data we’ve captured with our sequenced clone. Ideally the files with this data would automatically be uploaded to the LIMS API as those files were emitted by their corresponding instruments. Absent such a system, though, a solution would be to create a program that periodically monitors the directory into which these files are placed. When a new file is created, the program would then upload it to the API. With Affinity this is a one-step process - POST the file to the assay results API and call it a day.**
We are now ready to identify the best clones of the bunch to be expressed in antibody formats of interest. If our criteria are well-known, we can simply search the assay results for our clones and retrieve only those that match our criteria to submit for expression. This is straightforward using the Affinity API: for a given type of entity you can retrieve all results with assay data matching your query. If your criteria aren’t well-known or you prefer to peruse the data before making a decision, this becomes the point at which a scientist would be looped back into the process.
We hope this article starts you on the path to adding more Workflow Automation in your organization. If you’d like to learn more about all of the automated workflows built into Affinity or to explore opportunities for automation and orchestration through the Affinity REST API, please reach out. We’d be happy to demonstrate Affinity to you or to set you up with a trial so that you can try them yourself. Happy automating!
* ... in a secure manner and with a versioning system that won't cause your orchestration to break without notice!
** If this is something you’re interested in but don’t have the time or expertise to do, check out StackWave’s SDMS solution.