NCSA CyberSecurity


Mithril: Applying Adapability for Survivability

Collaborative scientific computing sites, such as the NRL Center for Computational Science, NSF computing sites (NCSA, SDSC, PSC, NCAR) and similar labs in DOE (e.g. NERSC, LBNL), have large distributed user communities, spread both geographically (over the globe) and administratively. A constant threat to these computing sites is the compromise of the end systems of their users. When such a compromise occurs, a typical repercussion is that user credentials (e.g. SSH keys or passwords) stored or used on that system will be captured by the attacker and used to gain illicit access to the computing site.Under normal day-to-day operation, production security teams at the computing sites handle a continuous small number of account compromises caused by compromise of these user systems by manually detecting such compromises (via monitoring of audit logs), revoking compromised credentials, and working with the end user and their administrators to restore integrity to the compromised system. However, incidents can occur, such as the incident that occurred in the summer of 2004, referred to as Incident 216 (this name comes from an internal FBI designation of the case), which overwhelm this day-to-day process. In Incident 216, the attackers compromised such a large number of user end systems that it became impossible for site security personnel to keep up with the process of detecting their compromise and arranging the restoration of their integrity. In the face of this incident many sites were forced to take their own systems or even their entire site off the net due to their inability to maintain integrity.

Incident 216 illustrates the situation faced by collaborative computing sites: their security measures and mechanisms are sufficient to allow them to maintain an acceptable operational state, however extreme situations overwhelm these security measures and mechanisms leaving the sites unable to maintain their integrity. A natural reaction to this situation is to raise the level of security at sites to higher levels that would be sufficient to provide protection from Incident 216-like attackers. This is akin to establishing a security perimeter around a hazardous area and allow only limited, authorized personnel to enter the area to respond to the hazard and to enable continuity of essential services. However, as we discuss subsequently, this brings with it significant costs, in terms of both purchasing and supporting new technologies, and decreased usability for users.

The Mithril project focuses on the application of survivability research to standard open source software to allow such sites to continue to operate and serve customers in the face of a extraordinary attack by temporarily and gracefully reducing their level of service but raising their level of security. We will develop a set of integrated security enhancements that not only increases day-to-day security, but also allows dynamic, temporary adaptations in security in response to a heightened level of threat. These enhancements will allow a site to maintain a high-level of openness and usability during normal periods of operation, but respond quickly to increased threat levels with increased security, while still continuing to serve key customers.

Here is a paper which summarizes our accomplishments and insights.
Mithril: An Experiment in Adaptive Security

Mithril is a collaboration between NCSA, PNNL and the NRL Center for Computational Science (CCS). NCSA and PNNL will lead the research and development efforts, with NRL CCS providing requirements and evaluations to ensure applicability of our work to NRL. NCSA will provide over management for the project.

Project Staff:

This project is funded by the Office of Naval Research (ONR) grant N0001404-1-0562 through the National Center for Advanced Secure Systems Research Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the view of ONR.