Ivy Ivy is a secure computing environment for researchers consisting of virtual machines (Linux and Windows) and Domino Data Lab. Researchers can use Ivy to process and store sensitive data with the confidence that the environment is secure and meets HIPAA requirements. Overview Ivy consists of two separate computing environments. Access to one environment does not automatically grant access to the others:
Virtual Machines Domino Data Lab Coming soon - Secure HPC Requesting Access Access to Ivy resources is project-based, limited to PIs and their designees, and requires approval.
Available Packages The following bioinformatics packages are available on the Windows Virtual Machines
Bowtie2 For more information on bowtie2, please click here
HISAT2 Requires approval before installation. For more information on HISAT2, please click here
Bowtie2 is a memory-efficient tool for aligning short sequences to long reference genomes. It indexes the genome using FM Index, which is based on Burrows-Wheeler Transform algorithm, to keep its memory footprint small. Bowtie2 supports gapped, local and paired-end alignment modes. Alignment to a known reference using Bowtie2 is often an essential first step in a myriad of NGS analyses workflows.
Bowtie2 Usage Alignment using bowtie2 is a 2-step process - indexing the reference genome, followed by aligning the sequence data.
Available Packages The following Data Analysis packages are available on the Ivy Windows Virtual Machines
MATLAB MATrix LABoratory (MATLAB for short) is a software designed for quick scientific calculations, such as matrix manipulation, plotting, and others. It has hundreds of built-in functions for a wide variety of computations and several tools designed for specific research disciplines, including statistics and partial differential equations.
* Limited licenses available, for more information on MATLAB and licensing, please click here
* Please note that HISAT2 requires approval prior to installation on the VM
HISAT2 is a fast and sensitive tool for aligning short reads against the general human population (as well as single reference genome). It indexes the genome using a Hierarchical Graph FM Index (HGFM) strategy, i.e. a large set of small indexes that collectively cover the whole genome (each index representing a genomic region of 56 Kbp).
SPSS Overview SPSS (or Statistical Package for Social Sciences), was initially developed as a social survey project but later on has grown to encompass statistical applications in almost all disciplines. Different industries use SPSS for their data analysis work. Its features include database management, reporting, graphing, among many others.
SPSS Usage SPSS is available only on the Windows VM at the moment. To run SPSS go to:
Start Menu > All Programs > IBM SPSS Statistics Licensing We have a limited number of SPSS licenses available, which are provided on a first-come-first-serve basis.
IDL Overview IDL, short for Interactive Data Language, is an interactive shell based data analysis programming language. Used vastly in medical imaging, it can quickly create visualizations and graphs of large data sets in a few easy steps due to its vector nature. FORTRAN users would be familiar with the IDL syntax. IDL is not to be confused with Java IDL or Microsoft IDL.
Licensing We have a limited number of IDL Licenses available, which are provided on a first-come-first-serve basis.
MATLAB Overview MATLAB is a high-performance language for technical computing. It integrates computation, visualization, and programming environment. MATLAB stands for MATrix LABoratory. MATLAB was made to provide easy access to matrix software developed by the LINPACK (linear system package) and EISPACK (Eigen system package) projects. MATLAB includes a programming language environment with built-in editing and debugging tools, and supports object-oriented programming.
Programming in MATLAB MATLAB has many advantages compared to conventional computer languages (e.
SAS Overview SAS is a command-driven software package used for statistical analysis and data visualization. It is available in . It is one of the most widely used statistical software packages in both industry and academia. You may use it if you have a large number of statistical algorithms. It is not limited to an industry, and could be used in both scientific and non-scientific contexts. We only offer the Teaching & Research version at the moment.
Stata Overview Stata is a graphical data analysis tool developed by StataCorp, and is short for Statiscs and Data. It is used in various disciplines, including biomedicine, economics, epidemiology, among others. It is capable of performing statistical analysis, simulations, regression, and data management. Besides the standard version Stata also ships with the MP version (multi=processing), and SE for large databases.
NB Users requesting an installation of Stata would be required to bring their own license.
cTAKES Overview cTAKES or The clinical Text Analysis and Knowledge Extraction System, is a Mayo Clinic developed Natural Language Processing (NLP) tool used to extract information out of clinical records. It is open-source, and built on the Apache Unstructured Information Management Architecture. cTAKES is modular, expandable, for a number of generic use cases, and contains excellent best practice notes.
cTAKES Usage cTAKES components Some of cTAKES components are listed below:
Pre-approved packages The following software packages are pre-approved for image processing on an Ivy Windows VM
Axiovision Axiovision is software for microscopy image processing and analysis.
Axiovision is highly configurableto meet the needs of your individual workflows.
KNIME KNIME is open source analytics platform for data mining and pipelining.
KNIME’s Image Processing Plugin allows users to perform common image processing techniques such as registration, segmentation, and feature extraction. KNIME is compatible with over 120 image file types and can be used alongside ImageJ.
Java SDK Overview Ivy Windows VMs are installed with Java SDK 1.8. Java is a popular Object Oriented programming language and is used in a multitude of scenarios. It is available under the GNU General Public License for all users. The SDK consists of a large number of tools such as javac that help in application development.
Running Java commands from the Command Prompt Open a Windows Command Prompt and enter java followed by the desired command.
Rodeo Overview Our Windows VMs are installed with Rodeo version 1.3, as of the last update. Rodeo is a lightweight, Python based, IDE for data science. It has a very streamlined code-to-plot workflow, with easily extensible packages that make it simple to analyze difficult patterns in data. It includes many data analysis features under one roof, and adopts features from iPython Notebook (it actually runs atop the iPython kernel). Like most Python projects, it is open source and available for free.
Perl Our VMs have Strawberry Perl 5.24 available as part of the Windows VM, as of the last writing. Licensed as open source under the GPL, it is most often used to develop mission critical software, and has excellent integration with markup languages such as HTML, XML, amongst others. Since it is both Object Oriented as well as procedural, it could be used within a multitude of programming projects. It includes built in database integration via its DBI module.
Sumatra PDF Overview Sumatra PDF is an open source software to view PDF files in Windows. It could be used to view PDF documents stored within the Ivy VM. As of the latest version, Sumatra supports multiple formats including PDF, EPUB, MOBI, and XPS.
Running Sumatra PDF From the Start menu, go to All Programs and search for Sumatra PDF. Click on the icon to run it.
More Information For more information, visit the Sumatra PDF official website.
R Overview R is an open source programming language, used by Data Miners, Scientists, Data Analysts, and Statisticians. It is available under the GNU GPL V2 license from the Comprehensive R Archive Network
R can be used for many statistical, modeling, and graphical solutions. It is very Object Oriented in nature and is easily extensible.
Running Rstudio from the desktop You can start R in a Graphical interface using the RStudio application from the desktop
Anaconda Our VMs have python 2 and 3 available as part of the Anaconda distribution. Anaconda comes installed with many packages best suited for scientific computing, data processing, and data analysis, while making deployment very simple. Its package manager conda installs and updates python packages and dependencies, keeping different package versions isolated on a project-by-project basis. Anaconda is available as open source under the New BSD license. It also ships with pip, the common python package manager.
Microsoft Office Overview The Ivy Windows VMs are installed with Microsoft Office 2016. Features such as OneDrive are not available since Ivy is not connected to the public internet. Therefore in order to load documents in and out of the VM, you would have to use the Globus DTN.
Softwares available The following software are available for use on the Ivy Windows VM:
Word 2016 Excel 2016 PowerPoint 2016 Access 2016 OneNote 2016 Outlook 2016 Publisher 2016 Running Office All Office software could be accessed from the Start menu using Start > All Programs