Facebook Data Centers and the Open Compute Project
Facebook data centers handle traffic and store data for roughly one billion people per day. Designing the hardware and infrastructure needed to scale to this level has required significant innovation from the Facebook Infrastructure team. In this talk, I present some of the notable innovations in data center power, cooling, and server/storage design that make up our seven global data centers. I also present our approach at ensuring efficient resource utilization by our large software code bases. Most of our major hardware and software designs have been released through the Open Compute Project, ensuring that the Facebook philosophy of “making the world more open and connected” also applies to our research and engineering innovations.
Kim Hazelwood is an engineering manager in Facebook’s Infrastructure division, where she leads a performance analysis team that drives the data center server and storage roadmap. Her research interests include computer architecture, performance analysis, and binary translation tools. Prior to Facebook, Kim held positions including a tenured Associate Professor at the University of Virginia, Software Engineer at Google, and Director of Systems Research at Yahoo Labs. She received a PhD in Computer Science from Harvard University in 2004, and is the recipient of an NSF CAREER Award, the Anita Borg Early Career Award, the MIT Technology Review Top 35 Innovators under 35 Award, and the ACM SIGPLAN 10-Year Test of Time Award. She has authored over 50 conference papers and one book.
HPC Storage Systems: Serving Data to the Lunatic Fringe
19 April 2016
Los Alamos National Laboratory
Before the popularity of big data infrastructure, the largest storage
systems in the world were found almost exclusively within High
Performance Computing (HPC) data centers. Although HPC storage systems
are no longer the largest in terms of total capacity, they continue to
address problems unique to simulation-based scientific inquiry. In
this talk, Brad Settlemyer discusses why and how HPC storage systems
differ from their commercial-world counterparts, how storage system
research impacts the HPC data center, and the research and development
efforts needed for HPC storage systems in the coming decade.
Brad Settlemyer is a storage systems researcher and systems programmer
specializing in high performance computing. He received his Ph.D in
computer engineering from Clemson University in 2009 and works as a
research scientist in Los Alamos National Laboratory's HPC Design
group. He has published papers on emerging storage systems, long
distance data movement, network modeling, and storage system
Introduction to Samsung's Storage Software Group in San Diego
4 February 2015
Samsung (formerly with oProximal Data)
An informal talk about the entrepreneurial path leading to the formation of Samsung's Storage Software Group in San Diego, the enterprise storage products under development, and a storage veteran's perspective on the future of storage in the enterprise.
Mr. Rory Bolt is Vice President, Storage Software Group at Samsung Semiconductor. Mr Bolt joined Samsung through the acquisition of Proximal Data, where he was the Founder and Chief Executive Officer. Mr Bolt has worked as a member of the Chief Technology Office of both NetApp and EMC. Mr. Bolt served as Chief Technology Officer of Avamar Technologies, Inc. Mr. Bolt has more than 25 years of experience in data storage systems, data protection systems, and high performance computing. Prior to joining Avamar, he held the position of Vice President, Distinguished Fellow, and Chief Storage Architect of Quantum Storage Solutions Group. Mr. Bolt co-founded @Backup, a service provider of online backup and storage for protecting systems over the Internet., and served as its Chief Technology Officer. As a development manager at OpenVision, he was responsible for the Unitree hierarchical storage management system that served as the basis for the IEEE Mass Storage Reference Model. He also served Stac Electronics, Spin Physics division of Eastman Kodak, and Fujitsu Systems of America. At Floating Point Systems, he was responsible for vector processor management software, hardware simulation software and the network service processor. Mr. Bolt has over a dozen storage related patents. He holds a Bachelor of Science degree in computer engineering from the University of California, San Diego.
The Write Stuff
27 January 2015
University of Wisconsin, Madison
For the past years, our research group has been studying
the write path in storage systems, from deep down in disk drives to
far up in applications themselves, and all the layers in between. We
have found (surprisingly) that the straightforward act of writing to disk is not
as simple as it might sound, and leads to numerous correctness and
performance problems in modern systems. In this talk, I will present
a sampling of our work and address some of the larger meta-questions,
such as: Why is getting write right so challenging? How often do such
write problems arise? How should we re-think how we build storage
systems to avoid these problems by design?
Also, if there is time, I will briefly speak about our experience with
FOBs (Free Online Books). We have a free online operating systems
textbook (www.ostep.org) whose chapters have been downloaded
millions of times, and think that we have found a new and excellent
way to "publish" textbooks in computer science and other fields.
Remzi Arpaci-Dusseau is a full professor in the Computer Sciences department
at the University of Wisconsin–Madison, where where he has taught for 15
years. He co-leads a group with hisq wife Andrea Arpaci-Dusseau; their work
focuses on systems (broadly) but with a special emphasis on file and storage
systems. They have graduated 15 Ph.D. students in their time at Wisconsin, and
some of their innovations, including block-level storage introspection,
transactional checksumming, fast file system checking, now ship in commercial
systems and are used daily by millions of people. Remzi also cares deeply
about education, and has won the SACM Student Choice Professor of the Year
award four times and the Carolyn Rosner "Excellent Educator" award
once. Chapters from a freely available OS book he and his wife co-wrote, found
at http://www.ostep.org, have been downloaded over 2 million times in the past
few years; the book is in use at numerous institutions around the world. Remzi
is also an active participant in the systems community, having served on
numerous program committees, as well as co-chair of USENIX ATC ’04, FAST ’07,
OSDI ’10, and SOCC ’14; he is also currently an associate editor for ACM
TOCS. Remzi has been a NetApp faculty fellow, an IBM faculty award winner, an
NSF CAREER award winner, and has consulted for numerous storage companies
including NetApp, Fusion-IO, and Huawei. Remzi also has been a visiting
professor at the University of Michigan, EPFL, and Stanford (currently), and a
visiting scientist at Google (also currently). Remzi also serves on three
advisory boards (Samsung DS CTO, Wisconsin OIP, and a stealth startup).
Context-Aware Computing and Memory Technologies, a Revolution in the Making
20 January 2015
Software industry invents a significant new platform about once every decade. I will share with you the basic principles of context-aware computing, and talk about Context Engines and Context Data Handling. This will lead into a technical discussion of how information versus technology ratios need to change in order to support the next wave of technology for context analytics and context-mediated transactions. We will talk about the nature of the workloads and how memory and storage technologies need to evolve. I will present the work we at SanDisk Technology Council are doing in this area and would like to invite you to innovate with us in inventing the future.
Pankaj Mehra is Senior Fellow at SanDisk Corporation, where he chairs SanDisk Technology Council. Previously, Dr. Mehra was SVP and CTO at enterprise flash technology pioneer Fusion-io, and before that at Whodini, an e-mail analytics company he founded. Appointed a Distinguished Technologist at Hewlett-Packard in 2004, for his groundbreaking work on persistent memory, Pankaj went on to found HP Labs Russia where he was Chief Scientist until 2010, and where he incubated Taxonom.com, a cloud service for automatically creating ontologies from queries, document collections, and examples. He previously served in various research and teaching positions at IIT Delhi, UC Santa Cruz, and NASA's Ames and IBM's T.J. Watson Research Centers. Pankaj’s 48 filed patents, 29 papers, and 3 books cover a range of topics in scalable intelligent systems, and his engineered systems have held TPC-C and Terabyte Sort performance records, and won recognition from NASA and Sandia National Labs. Of his three books, two are on Machine Learning and co-authored with Ben Wah at University of Illinois; the third, coauthored with John Wilkes, et al., is titled Storage, Data, and Information Systems. Pankaj volunteers at MMDS Foundation as their industry liaison chair, and previously served on the editorial boards of IEEE Internet Computing and Transactions on Computers, and on numerous program committees ranging from SuperComputing to International Semantic Web Conference.
Inside the Pure Storage Flash Array: Building a High Performance, Data Reducing Storage System from Commodity SSDs
21 October 2014
The storage industry is currently in the midst of a flash revolution. Today's smartphones, cameras, and many laptops all use flash storage, but the $30 billion a year enterprise storage market is still dominated by spinning disk. Flash has large advantages in speed and power consumption, but its disadvantages (cost, limited overwrites, large erase block size) have prevented it from being a drop-in replacement for disk in enterprise storage environments. This talk will describe the techniques that we've developed at Pure Storage to overcome these obstacles in creating a high-performance flash storage array using commodity SSDs.
We'll describe the design of the Pure FlashArray, an enterprise storage array built from the ground up from relatively inexpensive consumer flash storage. The array and its software, Purity, leverage the advantages of flash while minimizing the downsides. Purity performs all writes to flash in multiples of the SSD erase block size, and keeps data in a key-value store that persists approximate answers to further reduce writes at the cost of extra (cheap) reads. Our key-value store, which includes a medium-grained identifiers to enable large numbers of snapshots and a key range invalidation table, provides other advantages, such as the ability to take nearly instantaneous, zero-overhead snapshots and the ability to bound the size of our metadata structures despite using monotonically-increasing unique identifiers for many purposes. Purity also reduces the amount of user data stored on flash through a range of techniques, including compression, deduplication, and thin provisioning. The system relies upon RAID both for reliability and for performance consistency: by avoiding reads to devices that are being written, it ensures more efficient writes and eliminates long-latency reads. The net result is a flash array that delivers sustained read-write performance of over 500,000 8KB I/O requests per second while maintaining uniform sub-millisecond latency and providing an average data reduction rate of 6x, averaged across installed systems.
Ethan L. Miller is the Symantec Presidential Chair for Storage and Security and a Professor of Computer Science at the University of California, Santa Cruz, where he is the Director of the NSF I/UCRC Center for Research in Storage Systems (CRSS) and Associate Director of the Storage Systems Research Center (SSRC). He received his ScB from Brown in 1987 and his PhD from UC Berkeley in 1995, and has been on the UC Santa Cruz faculty since 2000. He has written over 125 papers covering topics such as archival storage, file systems for high-end computing, metadata and information retrieval, file systems performance, secure file systems, and distributed systems. He was a member of the team that developed Ceph, a scalable high-performance distributed file system for scientific computing that is now being adopted by several high-end computing organizations. His work on reliability and security for scalable and distributed storage is also widely recognized, as is his work on secure, efficient long-term archival storage and scalable metadata systems.
His current research projects, which are funded by the National Science Foundation, Department of Energy, and industry support for the CRSS and SSRC, include long-term archival storage systems, scalable metadata and indexing structures, high performance petabyte-scale storage systems, and file systems for non-volatile memory technologies. Prof. Miller's broader interests include file systems, parallel and distributed systems, operating systems, and computer security. In addition to research and teaching in storage systems and operating systems, Prof. Miller has worked with Pure Storage since its founding in 2009 to help develop affordable all-flash storage based on commodity SSDs for enterprise environments.
Technology Trends: A Storage Vendor's Perspective
8 April 2014
The storage industry has changed substantially since 1956.
entire landscape, with their attendant assumptions, is shifting
the feet of storage vendors. There are many opportunities - and
- in the near future. This talk will briefly survey some of the driving
trends keeping company strategists up at night.
Doug Santry earned his PhD from Cambridge University. He is best known
for the Elephant file-system. He is also an original member of the Xen
hypervisor team at Cambridge. Taking a break from systems he worked as a
quant for a leading investment bank where he formed strong opinions on the
fitness for purpose of modern storage systems. His research interests at
Netapp focus on making modern storage systems more useful for scientific
and mathematical computation.
Error Management and Control For Data-Center Based Next-Generation Solid State Drives
14 January 2014
There can be no doubt that NAND flash based Solid State Drives (SSDs) are having a massive impact on the design and implementation of data-centers. SSDs allow data-centers to accelerate applications that are deployed as cloud services as well as providing a low latency tier of storage and reducing the Total Cost of Ownership (TCO) of the data-center infrastructure. The huge demand for flash by data-center vendors and cloud service providers is tempered only by the cost per GB for an SSD. As such, there is huge pressure on SSD vendors to produce high capacity, high endurance SSDs for less cost. One way to achieve this goal is to use smaller lithography flash to permit more bits per unit area and hence less cost per bit. However as one scales down NAND flash the endurance of the flash tends to also drop. We can compensate for this by using more advanced error correction and media management techniques and for this reason. In this talk we will look at the trends in NAND flash and SSDs with regard to keeping SSDs on this downward trend in terms of cost per bit. We will look at schemes such as LDPC error correcting codes and how they can increase the endurance of an SSD whilst also reducing the cost of the SSD.
Stephen Bates received a 1st class BEng from the University of Edinburgh and a PhD from the same institution in 1994 and 1997 respectively. His PhD investigated the impact of self-similarity in the traffic that traverses computer networks and on the design of switches for such networks. Stephen joined Massana Ltd, in Dublin, Ireland in 1997. Massana developed physical layer solutions for the 1000BASE-T Gigabit Ethernet standard. Stephen was an architect for the signal processing sections of this PHY and as such worked on problems such as error correction, echo cancellation, timing recovery and channel equalization. In September 2003 Stephen joined the University of Alberta’s department of Electrical and Computer Engineering as an assistant professor. His research focused on bridging the domain between algorithm development and silicon implementation. He has published a number of papers in IEEE journals and at international conference. Stephen is currently a Technical Director in the Chief Strategy and Technology Office of PMC-Sierra. He works on defining PMCs product strategy and works with the Business Units at PMC to define and design their next-generation products. Stephen is also responsible for identifying key emerging technologies and assisting in mergers and acquisitions. Stephen is a Senior Member of the IEEE and a Professional Engineer of Canada.
Architecting 3D Memory Systems
Die stacked 3D DRAM technology can provide low-energy high-bandwidth memory module by vertically integrating several dies within the same chip. However, the size of such 3D memory modules is unlikely to be sufficient to provide the full memory capacity required for typical systems, so future memory systems are likely to use 3D DRAM together with traditional off-chip DRAM. In this talk, I will discuss how such memory systems can efficiently architect 3D DRAM either as a cache or as main memory.
First, I will show that some of the basic design decisions typically made for conventional caches (such as serialization of tag and data access, large associativity, and update of replacement state) are detrimental to the performance of DRAM caches, as they exacerbate hit latency. I will present Alloy Cache, a simple latency-optimized DRAM cache architecture that can outperform even an impractical SRAM Tag-Store design, which would incur an unacceptable overhead of several tens of megabytes.
Finally, I will present a memory organization that allows 3D DRAM to be a part of the OS-visible memory address space, and yet relieves the OS from data migration duties. The proposed CAMEO (CAche-like MEmory Organization) design performs data migration between off-chip memory and 3D DRAM at a line-size granularity, in a manner transparent to the OS. CAMEO outperforms using 3D DRAM only as a cache or only as a OS-managed two-level memory.
Dr. Moinuddin Qureshi joined the faculty of the Georgia Institute of Technology as anAssociate Professor in August 2011. His research interests include computer architecture, scalable memory systems, fault tolerant computing, and analytical modeling of computer systems. He worked as a research staff member at IBM T.J. Watson Research Center from 2007 to 2011. While at IBM, he contributed to the design of efficient caching algorithms for Power 7 processors. He was awarded the IBM outstanding technical achievement award for his studies on emerging memory technologies for server processors. He is a recipient of the NetApp Faculty Fellowship (2012) and Intel Early Career Faculty Award (2012). He received his Ph.D. (2007) and M.S. (2003), both in Electrical Engineering from the University of Texas at Austin, and his Bachelor of Electronics Engineering (2000) degree from University of Mumbai.
New Directions in Memory Architecture
3 December 2013
Demand is rapidly escalating for higher-performing, more energy-efficient memory and storage for applications ranging from mobile devices to the cloud. This has serious implications for the open server platform. DRAM faces a major overhaul that will demand new system architectures. Meanwhile, V-NAND, a vertical 3-D NAND flash technology, is offering both higher endurance and smaller cell sizes, in leading the way to substantive changes in system design. Furthermore, other technologies are on the horizon that together will enable a new generation of “persistent” memory devices. They will have even higher performance than today’s devices and will enable a new tier in both memory and storage architectures.
Hongzhong Zheng received the BS and MS degrees in electrical engineering and computer science from Huazhong University of Science and Technology, China, and the PhD degree in electrical and computer engineering from the University of Illinois at Chicago in 2009. He is currently a memory system architect at System Architecture Labs of Samsung Semiconductor Inc. He has extensive experience of novel memory system architecture with DRAM and emerging memory technologies, computer architecture and system performance modeling, energy-efficient computing system designs etc. He had more than 15 patents (including issued and pending applications), and more than 10 peer reviewed papers published in top journals and conferences. He is a member of the ACM and the IEEE.
Study of the Impact of Fast Non-Volatile Memory on Virtualized Environments
The distance in cycles from the CPU to durable storage has been a key aspect in the design of software. DRAM is typically lower latency, and hence reads and writes go a lot faster than even the fast IO devices like SSDs. Applications typically read data from durable storage into volatile DRAM, process them there, and periodically deposit the results into durable storage. OSs batch IOs and commit them in asynchronous fashion. All these latency hiding techniques may see their value diminish over the coming years due to the rise in capabilities of non-volatile memories. In this talk, we will explore the opportunities and challenges exposed by fast non-volatile memory.
Rajesh Venkatasubramanian has been with VMware since 2005. He has led the development of several vSphere memory management features including large page support, memory compression and swap caching on SSD. He enthusiastically participates in evolution of vMotion, distributed resource scheduling (DRS) and all resource management related features. He is very excited about the opportunities exposed by fast non-volatile memory and exploring how it will impact virtualized environments. Rajesh received PhD in Computer Science and Engineering from University of Michigan.
De-virtualization in Storage Systems
Yiying Zhang, U. Wisconsin
University of Wisconsin, Madison
Computer systems have become more complex over the past decades. As a result, excess virtualization can happen, where redundant levels of virtualization exist in a single system. For example, with a file system on top of a virtualized storage device, a block is first mapped from its file offset to its logical address and then from its logical address to its physical address. Excess virtualization and the indirection tables to realize layers of virtualization create both memory space and performance overhead.
In this talk, I will present our approaches of de-virtualization to remove excess virtualization in storage systems. Specifically, I will talk about 1) a new I/O interface called Nameless Writes which sends only data and no address to the device; the device allocates a physical address and returns it to the file system which then stores it for future reads, 2) a hardware prototype of Nameless Writes, and 3) a lightweight tool to dynamically remove storage device virtualization by changing file system pointers to point to physical addresses; doing so requires only small OS, device, and interface changes. I will outline the challenges we met and lessons we learned in designing new interfaces and systems for de-virtualization and in implementing them with real hardware.
Our results show that de-virtualization reduces flash-based SSD internal RAM space cost by 14-54 times and improves random write performance by 20 times compared to traditional SSDs. We also found that integrating new interfaces into existing systems is difficult and using a separate tool can be a good way to dynamically remove excess virtualization.
Yiying Zhang is a Ph.D. Candidate in the Department of Computer Sciences at the University of Wisconsin-Madison. Her advisors are Professors Andrea Arpaci-Dusseau and Remzi Arpaci-Dusseau. Her research interests are in the Operating Systems area focusing on File and Storage Systems. Before going to Madison, she received her M.S. in Computer Engineering from University of Florida and her B.S. in Computer Science from Fudan University.
Scaling the Memory Wall with Phase Change Memories
As conventional memory technologies such as DRAM run into the scaling wall, architects and system designers are forced to look at alternative technologies for building future computer systems. Several emerging Non-Volatile Memory (NVM) technologies such as PCM, STT-RAM, and Memristors have the potential to boost memory capacity in a scalable and power-efficient manner. However, these technologies are not drop-in replacements and will require novel solutions to enable their deployment. Even the prime candidates among these technologies have their own set of challenges such as higher read latency (than DRAM), much higher write latency, and limited write endurance.
In this talk, I will discuss some of our recent work that addresses these challenges. Our solutions include: hybrid memory systems, start-gap wear leveling, online attack detection, and efficient error correction. These solutions are applicable to a wide variety of emerging NVM technologies, and lay the groundwork for enabling their adoption in a broad spectrum of computer systems.
Dr. Moinuddin Qureshi joined the faculty of the Georgia Institute of Technology as an Associate Professor in August 2011. His research interests include computer architecture, scalable memory systems, fault tolerant computing, and analytical modeling of computer systems. He worked as a research staff member at IBM T.J. Watson Research Center from 2007 to 2011. While at IBM, he contributed to the design of efficient caching algorithms for Power 7 processors. He was awarded the IBM outstanding technical achievement award for his studies on emerging memory technologies for server processors. He received his Ph.D. (2007) and M.S. (2003), both in Electrical Engineering from the University of Texas at Austin, and his Bachelor of Electronics Engineering (2000) degree from University of Mumbai.