Skip to content

Data Import#

BaseJumper allows for the importation of FASTQ files.

File Naming Requirements#

For successful data importation, BaseJumper requires FASTQ files to adhere to a specific naming convention:

Paired-End Reads:

BIOSAMPLE-NAME_S1_L001_R1_001.fastq.gz

BIOSAMPLE-NAME_S1_L001_R2_001.fastq.gz

BaseJumper is designed to automatically merge data with the same biosample name during the import process, which is particularly useful for multi-lane data.

⚠ Note: When choosing a BIOSAMPLE-NAME, it is highly recommended to avoid _ , . , or any special characters.

Data Import Methods#

There are three primary methods for importing data into the BaseJumper Platform:

Importing Data from BaseSpace Sequence Hub (BSSH)#

BaseJumper has native support for the BaseSpace Sequence Hub, enabling programmatic interfacing via its REST API. Providing a security token with READ access from your BaseSpace Sequence Hub account enables BaseJumper to access the primary data directly. This security token can be included in the workspace settings of your user account, which will allow BaseJumper to stage the data directly to your workspace.

Globus

Obtaining a BSSH Access Token#

  1. Log into BSSH:

  2. Go to Apps:

    • After logging in, find the "Apps" link at the top of the page and click on it.
  3. Access the App Developer Portal:

    • In the right-side menu or sidebar, locate the "App Developer Portal" link or button and select it.
  4. Navigate to "My Apps":

    • Within the App Developer Portal, click on the "My Apps" section.
  5. Create a New Application:

    • Select the blue button titled "Create a new Application".

    • Complete the required fields (these can be placeholder details for testing purposes). Adhere to the provided instructions to finalize the app's creation.

  6. Access Credentials:

    • After your application is established, find the "Credentials" tab beneath or adjacent to the app's name and click on it.
  7. Obtain Your Access Token:

    • Within the "Your Access Token" section, an "Access Token" will be visible. Utilize this token for your API requests.

Importing Data from Globus#

In addition to the BaseSpace Sequence Hub, BaseJumper can natively import data from Globus, a widely recognized standard for data management and file transfers. Globus facilitates the transfer of data across various cloud providers and network storage systems. Many universities already have Globus set up. To learn more about Globus, please visit globus.org

Globus

After importing the files to Globus, the s3 path can directly be connected to user's workspace in BaseJumper.

Globus

Creating A Globus Account#

The process of setting up Globus involves the following steps:

Step 1: You will receive an email invitation to join a group.

Globus

 

Step 2: Click on “Click here to apply for membership.” On the Globus login page, you can use Google or ORCID to create a new account, and then click "Continue".

Globus

 

Step 3: You may be prompted to provide additional information such as your organization and whether or not Globus will be used for commercial purposes. Complete the form and click "Continue".

Globus

 

Step 4: Next, you must grant Globus the required permissions to use your identity to access information and perform actions (like file transfers) on your behalf.

Globus

 

Step 5: Following this, you will be prompted to enter your name and organization information before accepting the Globus group invitation. Enter your information and click 'Accept Invitation'.

Globus

 

Step 6: A message will appear stating 'Membership Active'. You can now click 'Visit the Group' to access the group overview.

Globus

 

Step 7: To access data/files managed by your group, you’ll begin at the File Manager. Note that the first time you use the File Manager, all fields will be blank.

Globus

 

Step 8: Click in the Collection field at the top of the File Manager page, it will bring you to a new page titled 'Collection Search'.

Globus

 

Step 9: To see your group's collection, click “Shared With You”.

Globus

 

Step 10: Click on the group name to access all files managed by your group.

Globus

 

Importing Data into Globus#

There are two primary methods for importing data into Globus:


Installation and Setup of Globus Personal Connect#

Purpose: Enables personal computers or devices to function as accessible endpoints within the Globus ecosystem.

Procedure:

  • Install Globus Personal Connect.
  • Use its graphical interface to designate your machine as an endpoint.
  • Marked directories on your device are then available for data transfers.

Benefits: Offers a graphical interface for ease of setup, making it user-friendly for those who prefer visual interactions over command-line operations.


Installation and Setup of Globus Command Line Interface (CLI)#

Purpose: A command-line interface allowing users to manage and initiate data transfers within the Globus service.

Procedure:

  • Install and authenticate using the Globus CLI.
  • Use terminal commands to control and manage data transfers.

Benefits: Provides a hands-on approach for those comfortable with terminal commands. Ideal for scripting, automating tasks, and integrating Globus operations into existing workflows.


If you wish to utilize the Globus transfer tools from the command line, you can download the Globus Command Line Interface (CLI). It is available as a Python package. Here are the steps to install and set it up:

  1. Install pipx, a package manager for Python:

    1
    2
    python3 -m pip install --user pipx
    python3 -m pip install --upgrade pip
    

  2. Install and upgrade Globus CLI using pipx:

    1
    2
    pipx install globus-cli
    pipx upgrade globus-cli
    

  3. Log in to Globus:

    1
    globus login
    
    You will be given a link to authenticate Globus and provided an authorization code.

    You have successfully logged in to the Globus CLI!

    You can check your primary identity with

    1
      globus whoami
    

    For information on which of your identities are in session use

    1
      globus session show
    

    Logout of the Globus CLI with

    1
      globus logout
    
  4. Search for the Guest Group Name:

    1
    globus endpoint search --filter-scope=all "Globus Group Name"
    

    The output should look something like this:

    1
    2
    ID                                   | Owner                                       | Display Name
    ------------------------------------ | ------------------------------------------- | -------------------
    

  5. Search for the Guest Collection

    1
    globus collection show "ID"
    


Installation and Setup of Globus Personal Connect

  1. Download and extract the latest version of Globus Personal Connect:
    1
    2
    3
    wget https://downloads.globus.org/globus-connect-personal/linux/stable/globusconnectpersonal-latest.tgz
    tar -xzf globusconnectpersonal-latest.tgz
    cd globusconnectpersonal-3.2.2
    
  2. Run the Globus Personal Connect:

    1
    ./globusconnectpersonal
    

    • During this process, you will be asked to create an endpoint.
    • Clear any old endpoints if you have any (delete the ~/.globusonline/).
  3. Navigate to the Globus Personal Connect directory and start it:

    1
    2
    cd ~/environment/globusconnectpersonal-3.2.2
    ./globusconnectpersonal -start
    


Transferring Files To Globus

After setting up Globus CLI and Globus Personal Connect, you can now transfer files.

Here is an example of transferring a file input.txt from your local machine to your Globus endpoint:

1
globus transfer 0cf6b284-fbc1-11ed-9bbb-c9bb788c490e:/home/ubuntu/environment/input.txt ec60609b-ed16-4787-bd32-946354639508:/test/input.txt

In this command, 0cf6b284-fbc1-11ed-9bbb-c9bb788c490e is the endpoint ID of your local machine and ec60609b-ed16-4787-bd32-946354639508 is the endpoint ID of "Globus Group Name". The directory paths after the IDs specify the source directory to be transferred and the destination directory where the data will be saved.

1
2
3
```bash
globus transfer -r 71de91d8-ad5d-4023-8de1-b2d7334a0345:/export_data/ 83fc7510-42a4-11ee-a06c-eb83daae1adf:/home/ubuntu/environment/test/
```

In this command, 71de91d8-ad5d-4023-8de1-b2d7334a0345 is the endpoint ID of the "Globus Guest Collection" and 83fc7510-42a4-11ee-a06c-eb83daae1adf is the endpoint ID of your local machine. The directory paths after the IDs specify the source directory to be transferred and the destination directory where the data will be saved.

Importing Data via Secure File Transfer Protocol (sFTP)#

To import data using the secure file transfer protocol (sFTP), follow the steps below:

  1. Create the private key by retrieving the "Private Key Mac/Linux" field from Keeper.

    • An email containing a link to the private key file will be sent to your inbox through our zero-trust sharing tool, Keeper.
  2. Navigate to the provided link and copy the line corresponding to the "Private Key Mac/Linux" field in Keeper.

  3. Open the terminal. If the terminal is already open, you can use the same terminal window. Type the following command:

    1
    nano bioskrybkey.pem
    

  4. Paste the line from Keeper's "Private Key Mac/Linux" field into the terminal window. Save the file by pressing the 'Command + X' keyboard keys.

  5. Change the permissions of the file by typing the following command:

    1
    chmod 600 bioskrybkey.pem
    

  6. Proceed to sFTP into our Bioskryb server. Use the command below, replacing "username" with the username provided in Keeper:

    1
    sftp -i ~/bioskrybkey.pem -o ServerAliveInterval=9999 username@sftp.bioskryb.com
    

  7. You will receive a warning prompt to import the key type. Type 'yes' and press Enter.

  8. You will then be prompted to enter the password, which can be found in Keeper. Please remember the password or store it in a password manager to adhere to our security standards.

    The final result should resemble the following:

    1
    2
    3
    4
    5
    6
    sftp -i ~/bioskrybkey.pem -o ServerAliveInterval=9999 username@sftp.bioskryb.com                                                                      
    Warning: Permanently added the RSA host key for IP address '3.223.137.129' to the list of known hosts.
    Welcome to the BioSkryb sftp upload server. 
    Enter passphrase for key '/home/ubuntu/bioskrybkey.pem': 
    Connected to sftp.bioskryb.com.
    sftp>