Run Design CSV specification for PacBio

The Run Design CSV is a comma-separated file which can be imported into SMRT Link to create a run design. Each line in the CSV represents a sample.

Key Value Example Value Spec INTERNAL
Experiment Name NoRS_Standard_Edna.1 Can be any ASCII string. Defaults to Run Name.  
Experiment Id 325/3250057 Must be a valid experiment ID. Details below.  
Experiment Description 20170530_A6_Iguana_VVnC_SampleSheet_TEMPLATE Can be any ASCII string. Defaults to Run Comments.  
Run Name 20170530_A6_Iguana_VVnC_SampleSheet_TEMPLATE Can be any ASCII string.  
System Name Sequel Must be either Sequel or Sequel II  
Run Comments ecoliK12_pbi_March2013 Can be any ASCII string.  
Is Collection TRUE Must be a Boolean value. Boolean details below.  
Sample Well A01 Must be a valid well number. Details below.  
Sample Name SMS_Iguana_A6_3230046_A01_TestCase_SB_BindKit_ChemKitv2_8rxnKit Can be any ASCII string.  
Cell No. 1 Must be an integer from 1 to 8. Details below. True
Sequencing Mode CLR Must be either CLR or CCS.  
Generate CCS Data FALSE Must be a Boolean value. Boolean details below. Default is False.  
Movie Time per SMRT Cell (hours) 5 Must be a float >= 0.1 and <= 20 for Sequel, <= 30 for Sequel II  
Use Predictive Loading TRUE Must be a Boolean value. Boolean details below.  
Loading Target (P1 + P2) 0.4 Must be a float between 0.01 and 1.  
Maximum Loading Time (hours) 1.2 Must be a float between 1 and 2.  
Sample Comment SMS_Iguana_A6_3230046_A01_TestCase_SB_BindKit_ChemKit Can be any ASCII string.  
Insert Size (bp) 2000 Must be an integer >= 10. Units are in bp.  
On-Plate Loading Concentration (pM) 5 Must be a float. Units are in pM.  
Size Selection FALSE Must be a Boolean value. Boolean details below. Default is False.  
Stage Start FALSE Must be a Boolean value. Boolean details below. Default is False. True
Reuse Sample FALSE Must be a Boolean value. Boolean details below. Default is False. True
Template Prep Kit Box Barcode DM1117100259100111716 Must be valid kit barcode. Details below.  
DNA Control Complex Box Barcode DM1234101084300123120 Must be valid kit barcode. Details below.  
Binding Kit Box Barcode DM1117100862200111716 Must be valid kit barcode. Details below.  
Sequencing Kit Box Barcode DM0001100861800123120 Must be valid kit barcode. Details below.  
Wash Kit Box Barcode DM2222100866100123120 Must be valid kit barcode. Details below.  
Automation Name Diffusion Can be “diffusion”, “magbead”, or a custom script. Details below.  
Automation Parameters ExtensionTime=double:60|ExtendFirst=boolean:True Must follow format demonstrated in Value Example. Details below.  
Primary Analysis Default Can be any ASCII string. True
Primary Analysis Parameters CopyFileTrace=boolean:true Must follow format demonstrated in Value Example. Details below. True
Sample is Barcoded TRUE Must be a Boolean value. Details on booleans below.  
Barcode Set dad4949d-f637-0979-b5d1-9777eff62008 Must be a uuid for a barcodeset present in the database.  
Same Barcodes on Both Ends of Sequence TRUE Must be a Boolean value. Details on booleans below.  
Barcode Name lbc1–lbc1 Must be a valid barcode pair.  
Bio Sample Name sample1 Can be any ASCII string.  
Pipeline Id pbsmrtpipe.pipelines.sa3_ds_isoseq3_with_genome Must be a valid pbsmrtpipe pileine Id  
Analysis Name sample1 analysis Can be any ASCII string.  
Entry Points PacBio.DataSet.BarcodeSet;eid_barcode;afe89e3f-17ca-e9b8-eae9-b701dbb1f02d A “|” separated list with entries: file_type;entry_id;uuid  
Task Options isocollapse.task_options.allow_extra_5exon;boolean;false A “|” separated list with entries: task_id;value_type;value  

General Requirements

The csv may only contain ASCII characters. Specifically, it must satisfy the regular expression:

  • /^[\x00-\x7F]*$/g

Required Fields

  • Run Name
  • Sample Well
  • Sample Name
  • Movie Time per SMRT Cell (hours)
  • Insert Size (bp)
  • Template Prep Kit Box Barcode
  • Binding Kit Box Barcode
  • Sequencing Kit Box Barcode

Is Collection

This field indicates whether the line is specifying a collection (TRUE), or a barcoded sample (FALSE). Collection lines should leave Barcode Names and Bio Sample Names blank. Barcoded sample lines only need to contain the Is Collection, Sample Name, the Barcode Name, and Bio Sample Name fields.

Experiment ID

Experiment IDs cannot contain the following characters: <, >, :, ", \, |, ?, *, or ). Experiment IDs cannot start or end with a “/” and cannot have two adjacent “/”, i.e. “//”. Experiment IDs also cannot contain spaces. Specifically, Experiment IDs cannot satisfy the regular expressions:

  • /[<>:"\\|?\*]/g
  • /(?:^\/)|\/\/|(?:\/$)/
  • / /g

Cell No.

The cell numbers must satisfy a valid cell re-use scheme.

Sample Well

The well number must start with a letter “A” through “H”, and end in a number “01” through “12”, i.e. “A01” through “H12”. In other words, it must satisfy the regular expression:

  • /^[A-H](?:0[1-9]|1[0-2])$/

Automation Name

The automation name can be either “diffusion” or “magbead” and is not case-sensitive. A path can also be used, such as “/path/to/my/script/my_script.py”. The path will not be processed further, so if the full URI is required, it must be provided in the CSV, e.g. “chemistry://path/to/my/script/my_script.py”.

Boolean Values

Acceptable boolean values for true are:

  • “true”
  • “t”
  • “yes”
  • “y”

Acceptable boolean values for false are:

  • “false”
  • “f”
  • “no”
  • “n”

Boolean values are not case-sensitive.

Kit Barcodes

The kit barcodes are composed of three parts:

  • Lot Number (ex: “DM1234”)
  • Part Number (ex: “100-619-300”)
  • Expiration Date (ex: “2020-12-31”)

which is used to make a single string. Using the above example, the barcode would be:

  • DM1234100619300123120

Each kit must have a valid Part Number and cannot be obsolete. The list of kits can be found through a services endpoint such as:

  • [server name]:[services port number]/smrt-link/bundles/chemistry-pb/active/files/definitions%2FPacBioAutomationConstraints.xml

This services endpoint will list, for each kit, the part numbers (“PartNumber”) and whether it is obsolete (“IsObsolete”). Dates must also be valid, meaning they must exist on the Gregorian calendar.

Parameters

The parameters are a “|” separated list. Each item follows the format: [parameter name]=[parameter type]:[parameter value]. Primary analysis parameters are:

  • Readout
  • MetricsVerbosity
  • CopyFileTrace
  • CopyFileBaz
  • CopyFileDarkFrame
  • CopyStatsH5

Acceptable parameter types are:

  • String
  • Int32
  • UInt32
  • Double
  • Single
  • Boolean
  • DateTime

The parameter names and types are not case-sensitive.

Barcoded Sample Names

The barcoded sample names are a “|” separated list. Each item in the list follows the format: [barcode name];[biosample name] The barcode names must be contained within the specified barcodeset. A given barcode name cannont appear more than once in the list. The biosample names can be any ASCII string but cannot contain the field separators “|” and “;”. The biosample names cannot be longer than 40 characters. A maximum of 384 barcodes is permitted per sample.

Auto Analysis fields

Auto Analysis is only supported on Sequel II. These fields include: Pipeline Id, Analysis Name, Entry Points, Task Options. You may define one analysis for each collection and bio sample. Pipeline Id, Analysis Name and Entry Points fields are required. The Task Options fields may be left empty, any task options not specified will use pipeline defaults.