TidyFS is the research prototype for the Distributed Storage Catalog (DSC), which is the distributed storage system that is shipping as a component for data-intensive computing on Windows HPC Server 2008 R2. Both systems share the same general design principles.
TidyFS is a simple distributed file system that provides the abstractions necessary for data parallel computations on clusters. The prototypical workload for this system is high-throughput, write-once, sequential I/O. The primary user visible unit of storage in this system is the stream, which is a sequence of parts distributed across the local storage of machines in the cluster. The mapping of streams to sequences of parts is performed by the TidyFS metadata server, which also tracks the locations of each of the part replicas in the system, the state of each storage machine in the cluster, and per-stream and per-partition attributes. The figure below presents a diagram of the system architecture, along with a sample cluster configuration and stream. The TidyFS design is described in detail in the the TidyFS TechReport.