California Fault Lines: Understanding the Causes and Impact of Network Failures

  • Alex C. Snoeren, University of California, San Diego; Stefan Saroiu, Microsoft

Of the major factors affecting end-to-end service availability, network component failure is perhaps the least well understood. How often do failures occur, how long do they last, what are their causes, and how do they impact customers? Traditionally, answering questions such as these has required dedicated (and often expensive) instrumentation broadly deployed across a network.

We are exploring an alternative approach: opportunistically mining “low-quality” data sources that are already available in modern network environments. This talk will describe a methodology for recreating a succinct history of failure events in a wide-area IP network using a combination of structured data (router configurations and syslogs) and semi-structured data (email logs). We have used this technique to analyze over five years of failure events in a large regional network consisting of over 200 routers; to our knowledge, this is the largest study of its kind. Currently, we are conducting similar evaluations in a number of enterprise environments.

Speaker Details

Alex C. Snoeren is an Associate Professor in the Computer Science and Engineering Department at the University of California, San Diego, where he is a member of the Systems and Networking Research Group. His research interests include operating systems, distributed computing, and mobile and wide-area networking. Professor Snoeren received a Ph.D. in Computer Science from the Massachusetts Institute of Technology (2003) and an M.S. in Computer Science (1997) and Bachelors of Science in Computer Science (1996) and Applied Mathematics (1997) from the Georgia Institute of Technology. He is a recipient of the Alfred P. Sloan Fellowship (2009), a National Science Foundation CAREER Award (2004), the MIT EECS George M. Sprowls Doctoral Dissertation Award (Honorable Mention, 2003), and best-paper awards at the ACM SIGCOMM (2001, 2007) and USENIX OSDI (2008) conferences.