======================================= Run Design CSV specification for PacBio ======================================= The Run Design CSV is a comma-separated file which can be imported into SMRT Link to create a run design. Each line in the CSV represents a sample. +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Key | Value Example | Value Spec | INTERNAL | +========================================+============================================================================+===================================================================+==========+ | Experiment Name | NoRS_Standard_Edna.1 | Can be any ASCII string. Defaults to Run Name. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Experiment Id | 325/3250057 | Must be a valid experiment ID. Details below. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Experiment Description | 20170530_A6_Iguana_VVnC_SampleSheet_TEMPLATE | Can be any ASCII string. Defaults to Run Comments. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Run Name | 20170530_A6_Iguana_VVnC_SampleSheet_TEMPLATE | Can be any ASCII string. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | System Name | Sequel | Must be either Sequel or Sequel II | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Run Comments | ecoliK12_pbi_March2013 | Can be any ASCII string. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Is Collection | TRUE | Must be a Boolean value. Boolean details below. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Sample Well | A01 | Must be a valid well number. Details below. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Sample Name | SMS_Iguana_A6_3230046_A01_TestCase_SB_BindKit_ChemKitv2_8rxnKit | Can be any ASCII string. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Cell No. | 1 | Must be an integer from 1 to 8. Details below. | True | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Sequencing Mode | CLR | Must be either CLR or CCS. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Generate CCS Data | FALSE | Must be a Boolean value. Boolean details below. Default is False. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Movie Time per SMRT Cell (hours) | 5 | Must be a float >= 0.1 and <= 20 for Sequel, <= 30 for Sequel II | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Use Predictive Loading | TRUE | Must be a Boolean value. Boolean details below. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Loading Target (P1 + P2) | 0.4 | Must be a float between 0.01 and 1. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Maximum Loading Time (hours) | 1.2 | Must be a float between 1 and 2. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Sample Comment | SMS_Iguana_A6_3230046_A01_TestCase_SB_BindKit_ChemKit | Can be any ASCII string. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Insert Size (bp) | 2000 | Must be an integer >= 10. Units are in bp. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | On-Plate Loading Concentration (pM) | 5 | Must be a float. Units are in pM. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Size Selection | FALSE | Must be a Boolean value. Boolean details below. Default is False. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Stage Start | FALSE | Must be a Boolean value. Boolean details below. Default is False. | True | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Reuse Sample | FALSE | Must be a Boolean value. Boolean details below. Default is False. | True | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Template Prep Kit Box Barcode | DM1117100259100111716 | Must be valid kit barcode. Details below. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | DNA Control Complex Box Barcode | DM1234101084300123120 | Must be valid kit barcode. Details below. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Binding Kit Box Barcode | DM1117100862200111716 | Must be valid kit barcode. Details below. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Sequencing Kit Box Barcode | DM0001100861800123120 | Must be valid kit barcode. Details below. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Wash Kit Box Barcode | DM2222100866100123120 | Must be valid kit barcode. Details below. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Automation Name | Diffusion | Can be "diffusion", "magbead", or a custom script. Details below. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Automation Parameters | ExtensionTime=double:60|ExtendFirst=boolean:True | Must follow format demonstrated in Value Example. Details below. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Primary Analysis | Default | Can be any ASCII string. | True | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Primary Analysis Parameters | CopyFileTrace=boolean:true | Must follow format demonstrated in Value Example. Details below. | True | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Sample is Barcoded | TRUE | Must be a Boolean value. Details on booleans below. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Barcode Set | dad4949d-f637-0979-b5d1-9777eff62008 | Must be a uuid for a barcodeset present in the database. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Same Barcodes on Both Ends of Sequence | TRUE | Must be a Boolean value. Details on booleans below. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Barcode Name | lbc1--lbc1 | Must be a valid barcode pair. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Bio Sample Name | sample1 | Can be any ASCII string. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Pipeline Id | pbsmrtpipe.pipelines.sa3_ds_isoseq3_with_genome | Must be a valid pbsmrtpipe pileine Id | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Analysis Name | sample1 analysis | Can be any ASCII string. | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Entry Points | PacBio.DataSet.BarcodeSet;eid_barcode;afe89e3f-17ca-e9b8-eae9-b701dbb1f02d | A "|" separated list with entries: file_type;entry_id;uuid | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ | Task Options | isocollapse.task_options.allow_extra_5exon;boolean;false | A "|" separated list with entries: task_id;value_type;value | | +----------------------------------------+----------------------------------------------------------------------------+-------------------------------------------------------------------+----------+ General Requirements -------------------- The csv may only contain ASCII characters. Specifically, it must satisfy the regular expression: - ``/^[\x00-\x7F]*$/g`` Required Fields --------------- - Run Name - Sample Well - Sample Name - Movie Time per SMRT Cell (hours) - Insert Size (bp) - Template Prep Kit Box Barcode - Binding Kit Box Barcode - Sequencing Kit Box Barcode Is Collection ------------- This field indicates whether the line is specifying a collection (TRUE), or a barcoded sample (FALSE). Collection lines should leave Barcode Names and Bio Sample Names blank. Barcoded sample lines only need to contain the Is Collection, Sample Name, the Barcode Name, and Bio Sample Name fields. Experiment ID ------------- Experiment IDs cannot contain the following characters: ``<, >, :, ", \, |, ?, *, or )``. Experiment IDs cannot start or end with a "/" and cannot have two adjacent "/", i.e. "//". Experiment IDs also cannot contain spaces. Specifically, Experiment IDs cannot satisfy the regular expressions: - ``/[<>:"\\|?\*]/g`` - ``/(?:^\/)|\/\/|(?:\/$)/`` - ``/ /g`` Cell No. -------- The cell numbers must satisfy a valid cell re-use scheme. Sample Well ----------- The well number must start with a letter "A" through "H", and end in a number "01" through "12", i.e. "A01" through "H12". In other words, it must satisfy the regular expression: - ``/^[A-H](?:0[1-9]|1[0-2])$/`` Automation Name --------------- The automation name can be either "diffusion" or "magbead" and is not case-sensitive. A path can also be used, such as "/path/to/my/script/my_script.py". The path will not be processed further, so if the full URI is required, it must be provided in the CSV, e.g. "chemistry://path/to/my/script/my_script.py". Boolean Values -------------- Acceptable boolean values for true are: - "true" - "t" - "yes" - "y" Acceptable boolean values for false are: - "false" - "f" - "no" - "n" Boolean values are not case-sensitive. Kit Barcodes ------------ The kit barcodes are composed of three parts: - Lot Number (ex: "DM1234") - Part Number (ex: "100-619-300") - Expiration Date (ex: "2020-12-31") which is used to make a single string. Using the above example, the barcode would be: - DM1234100619300123120 Each kit must have a valid Part Number and cannot be obsolete. The list of kits can be found through a services endpoint such as: - [server name]:[services port number]/smrt-link/bundles/chemistry-pb/active/files/definitions%2FPacBioAutomationConstraints.xml This services endpoint will list, for each kit, the part numbers ("PartNumber") and whether it is obsolete ("IsObsolete"). Dates must also be valid, meaning they must exist on the Gregorian calendar. Parameters ---------- The parameters are a "|" separated list. Each item follows the format: [parameter name]=[parameter type]:[parameter value]. Primary analysis parameters are: - Readout - MetricsVerbosity - CopyFileTrace - CopyFileBaz - CopyFileDarkFrame - CopyStatsH5 Acceptable parameter types are: - String - Int32 - UInt32 - Double - Single - Boolean - DateTime The parameter names and types are not case-sensitive. Barcoded Sample Names --------------------- The barcoded sample names are a "|" separated list. Each item in the list follows the format: [barcode name];[biosample name] The barcode names must be contained within the specified barcodeset. A given barcode name cannont appear more than once in the list. The biosample names can be any ASCII string but cannot contain the field separators "|" and ";". The biosample names cannot be longer than 40 characters. A maximum of 384 barcodes is permitted per sample. Auto Analysis fields -------------------- Auto Analysis is only supported on Sequel II. These fields include: Pipeline Id, Analysis Name, Entry Points, Task Options. You may define one analysis for each collection and bio sample. Pipeline Id, Analysis Name and Entry Points fields are required. The Task Options fields may be left empty, any task options not specified will use pipeline defaults.