Crowdsourcing Service-level Network Event Monitoring

The user experience for networked applications is becoming a key benchmark for customers and network providers. Perceived user experience is largely determined by the frequency, duration and severity of network events that impact a service. While today’s networks implement sophisticated infrastructure that issues alarms for most failures, there remains a class of silent outages (e.g., caused by configuration errors) that are not detected. Further, existing alarms provide little information to help operators understand the impact of network events on services. Attempts to address this through infrastructure that monitors end-to-end performance for customers have been hampered by the cost of deployment and by the volume of data generated by these solutions. We present an alternative approach that pushes monitoring to applications on end systems and uses their collective view to detect network events and their impact on services - an approach we call Crowdsourcing Event Monitoring (CEM). This paper presents a general framework for CEM systems and demonstrates its effectiveness for a P2P application using a large dataset gathered from BitTorrent users and confirmed network events from two ISPs. We discuss how we designed and deployed a prototype CEM implementation as an extension to BitTorrent. This system performs online service-level network event detection through passive monitoring and correlation of performance in end-users’ applications.