Cloud computing is ideal for running flexible, scalable applications on demand, in periodic bursts, or for fixed periods of time. UVA SOMRC works alongside researchers to design and run research applications and datasets into Amazon Web Services, the leader among public cloud vendors. This means that server, storage, and database needs do not have to be estimated or purchased beforehand – they can be scaled larger and smaller with your needs, or programmed to scale dynamically with your application.
Researchers Using the Cloud
|Serverless Web||SoM faculty and researchers can share data, findings, tools and other resources from static HTML content published to object storage. This simple method for publishing can cost only a few dollars a month and requires no server management.|
|Data Lakes||A new paradigm in data storage and processing, data lakes help researchers by providing a central repository for both structured and unstructured data, of any type or size. These data can then be siphoned off for processing, either in real-time streams or in queues for later analysis.|
|Services Alongside HPC||Users of HPC usually have more than enough computing power to run their jobs. But what if you need a relational or NoSQL database, a messaging service, or offsite storage? Researchers have begun integrating the cloud into their HPC jobs to create, use, and manage external services like these.|
|HIPAA-Compliant Computing||Researchers working on clinical datasets use Ivy, our private virtualized platform to perform HIPAA compliant analytics and compute jobs. This platform offers virtual machines, an R/Python data analytics tool, and Hadoop/Spark for larger analytics projects. Many users in Ivy work with EPIC clinical data alongside other highly-sensitive datasets for their investigations.|
|Workflows & Pipeline Management||Researchers need flexibility for where they run their data pipelines -- it might be on a personal computer, a lab server, an HPC cluster, or a cloud instance. We are working with faculty to extend some commonly-used pipeline tools so that they can create and push jobs to cloud-based resources, regardless of the cloud vendor.|
|Long-term Cold Storage||AWS Glacier and Google Nearline/Coldline offer researchers "cold" offsite storage for long-term backups of infrequently-accessed data. Genomics researchers use Glacier to store terabytes of source data as required by grants and federal research projects.|
Other Common Use Cases
Proofs of concept - To verify a system or design works, to benchmark processing speeds, we may use short-lived instances to learn from before building a production system.
Test / Development environments - For installing test packages, trying new ideas, and testing design patterns.
Dynamic / flexible / scaling application stacks - When future traffic or load cannot be determined beforehand, deploying into a dynamic environment means the infrastructure is not locked into any set type of CPU/RAM or scale.
Short-term or fast deployment projects - For almost immediate computing needs, existing users can create new instances as needed.
Container deployments - Run microservices (such as Docker containers) in an environment that can load-balance their traffic and maintain container health.
Service Oriented Architecture
A key advantage of the cloud is that for many services you do not need to build or maintain the servers that support the service – you simply use it.
Here are some of the building blocks available using cloud infrastructure:
- Containers / Docker
- Analytics / Data Management
- Continuous Integration
- Sensor / IoT Data Streaming
- Messaging Queues
- SMS / Push Integration
- Alexa Skills / Speech Integration
- Serverless Computing
- Code Build / Validation
To get an idea of how AWS is used in real-world and research scenarios, visit the AWS Architecture Center or review some reference deployments below. These examples are drawn from AWS.
Build auto-scalable batch processing systems like video/image/datastream processing pipelines (PDF)
Large Scale Processing and Huge Data sets
Build high-performance computing systems that involve Big Data (PDF)
Time Series Processing
Build elastic systems that process time series data (PDF)
Solution Architecture & Consulting
We have experience designing and delivering solutions to the public cloud using industry best practices. If you have a project and would like to discuss options, pricing, design, or implementation, we are available for consultation. Our staff includes an AWS certified solution architect, and the entire SOMRC team uses AWS for our own internal systems and development.
We also offer in-person, hands-on workshops and sessions on working with the cloud. Workshops cover a number of topics, from creating object storage buckets and simple compute instances to more complex data-driven workflows and Docker containers, If you have an idea for a workshop or would like to schedule training for your lab or group, please contact us.