2008-09-13
分布式、集群文件系统小结
作者:花开 发布时间: 2008-09-13 | 版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处、作者信息及本声名。
本文链接: http://www.bsdmap.com/2008/09/13/dfs/
顺序不分先后:
Lustre
Lustre is a scalable, secure, robust, highly-available cluster file system. It is designed, developed and maintained by Sun Microsystems, Inc.
Designed to meet the demands of the world’s largest high-performance compute clusters, the Lustre file system redefines scalability and provides groundbreaking I/O and metadata throughput. An object-based cluster, Lustre currently supports tens of thousands of nodes, petabytes of data, and billions of files — and development is underway to support one million nodes, trillions of files, and zetta to yotta bytes.
http://www.sun.com/software/products/lustre/
http://wiki.huihoo.com/index.php?title=Lustre
OpenAFS
What is AFS?
AFS is a distributed filesystem product, pioneered at Carnegie Mellon University and supported and developed as a product by Transarc Corporation (now IBM Pittsburgh Labs). It offers a client-server architecture for file sharing, providing location independence, scalability and transparent migration capabilities for data. OpenAFS is the Transarc source code released as it looked like around AFS3.6 under IBM Public License IPL.
Arla
Arla is a free AFS implementation.
The main goal is to make a fully functional client with all capabilities of AFS as formerly sold by Transarc and today available as OpenAFS. Other stuff, such as servers and management tools are being developed, but currently not considered stable.
Coda
Coda分布式文件系统:http://www.bsdmap.com/diary/coda.php
Coda File System http://www.coda.cs.cmu.edu/
Coda is a forked of version of AFS that support disconnected and weakly connected mode better then AFS.
InterMezzo
InterMezzo is a new distributed file system with a focus on high availability. InterMezzo will be suitable for replication of servers, mobile computing, managing system software on large clusters, and for maintenance of high availability clusters.
xFS
xFS is a Serverless Network File Service.
CFS
Cluster File Systems, Inc. is the leading developer of next generation technology for scalable high-performance file systems. Our Lustre® file system redefines scalability and has been designed from the ground up to meet the demands of the world’s largest high-performance computer clusters.
GlusterFS
GlusterFS is a cluster file-system capable of scaling to several peta-bytes. It aggregates various storage bricks over Infiniband RDMA or TCP/IP interconnect into one large parallel network file system. GlusterFS is based on a stackable user space design without compromising performance.
Scalable File Share
HP StorageWorks Scalable File Share
A high-bandwidth, scalable storage appliance for Linux clusters
http://h20311.www2.hp.com/HPC/cache/276636-0-0-0-121.html
MogileFS
MogileFS is our open source distributed filesystem. Its properties and features include:
-1. Application level
-2. No single point of failure
-3. Autumaic file replication
-4. “Better than RAID”
-5. Flat Namespace
-6. Shared-Nothing
-7. No RAID required
-8. Local filesystem agnostic
Hadoop
The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing, including:
* Hadoop Core, our flagship sub-project, provides a distributed filesystem (HDFS) and support for the MapReduce distributed computing metaphor.
* HBase builds on Hadoop Core to provide a scalable, distributed database.
* ZooKeeper is a highly available and reliable coordination system. Distributed applications use ZooKeeper to store and mediate updates for critical shared state.
PVFS
http://www.pvfs.org/
http://www.parl.clemson.edu/pvfs/
PVFS is designed to provide high performance for parallel applications, where concurrent, large IO and many file accesses are common. PVFS provides dynamic distribution of IO and metadata, avoiding single points of contention, and allowing for scaling to high-end terascale and petascale systems.
http://en.wikipedia.org/wiki/Global_File_System
http://www.redhat.com/docs/manuals/csgfs/
GFS (Global File System) is a cluster file system. It allows a cluster of computers to simultaneously use a block device that is shared between them (with FC, iSCSI, NBD, etc…). GFS reads and writes to the block device like a local filesystem, but also uses a lock module to allow the computers coordinate their I/O so filesystem consistency is maintained. One of the nifty features of GFS is perfect consistency — changes made to the filesystem on one machine show up immediately on all other machines in the cluster.
See also
- Comparison of file systems
- GPFS, ZFS
- Lustre
- GlusterFS
- List of file systems
- Oracle cluster file system (OCFS)
- SAN file system
- Fencing
External links About GFS
- Red Hat GFS Product Page
- Red Hat Cluster Suite and GFS Documentation Page
- GFS Project Page
- DataCore’s GFS Informations
- OpenGFS Project Page
- The GFS2 development git tree
1. HP OpenVMS
————–
The first to work with a CFS is HP OpenVMS. Oracle Parallel Server and RAC always used
the OpenVMS filesystem (RMS) for its database.
2 HP Tru64
————
CFS is a layer on top of Advfs the filesystem of HP Tru64. Oracle uses
the Direct I/O feature available in CFS. Direct I/O enables Oracle to bypass
the buffer cache (no caching at filesystem level). Oracle manages the
concurrent access to the file itself; as it does on raw devices. On CFS,
without Direct I/O enabled on files - file access goes through a CFS server.
A CFS server runs on a cluster member and serves a file domain. A file
domain can be relocated from one cluster member to another cluster member
online. A file domain may contain one or more filesystems.
Direct I/O does not go through the CFS server, but file creation and resizing
is seen as metadata operation by advfs and this has to be done by the CFS
server. The consequence is to run file creations and resizing on the node
where the CFS server is located. File operations might take longer when the
CFS server is remote.
Oracle recommends not using the tempfile option, as tempfiles might not be
allocated until the tempfile blocks are accessed and so cause
‘remote metadata operations’ for advfs.
3 Veritas
———–
VERITAS Database EditionTM / Advanced Cluster for Oracle9i RAC enables Oracle
to use the CFS. The VERITAS Cluster File System is an extension of the VERITAS
File System (VxFS). Veritas CFS allows the same filesystem to be simultaneously
mounted on multiple nodes. Veritas CFS is designed with a master/slave
architecture. Any node can initiate a metadata operation (create, delete, or
resize data), the actual operation is carried out by the master node. All other
(non metadata) IO goes directly to the disk.
CFS is used in DBE/AC to manage a filesystem in a large database environment.
When used in DBE/AC for Oracle9i RAC, Oracle accesses data files stored on CFS
filesystems by bypassing the filesystem buffer and filesystem locking for data.
4 Oracle Cluster File System
——————————
Oracle Cluster File System (OCFS) is a shared filesystem designed specifically
for Oracle Real Application Clusters. OCFS eliminates the requirement for Oracle
database files to be linked to logical drives and enables all nodes to share a
single Oracle_Home (current capabilities are detailed in section 2.
instead
of requiring each node to have its own local copy. OCFS volumes can span one
shared disk or multiple shared disks for redundancy and performance
enhancements.
5. Netapp(R) Filer
——————-
Netapp Filer offers CFS functionality via NFS to the server machines. These
filesystems are mounted using special mount options. For details please see
Netapp documentation.
Netapp certifications can be found at:
http://www.netapp.com/part…
To understand the architecture and Oracle installation please see these
documents:
Note 210889.1: RAC Installation with a NetApp Filer in Red Hat Linux Environment
and
Oracle9i RAC Installation with a NetApp Filer on Fujitsu-Siemens Primepower
(Solaris8 Operating System) at http://www.netapp.com/tech…
6 AIX
——-
IBM’s General Parallel File System (GPFS) allows users shared access to files
that may span multiple disk drives on multiple nodes. GPFS provides access to
all data from all nodes of the cluster. It can be configured with multiple
copies of metadata allowing continued operation should the paths to a disk or
the disk itself be broken. Metadata is the filesystem data that describes
the user data. GPFS allows the use of RAID or other hardware redundancy
capabilities to enhance reliability.
In Oracle9i GPFS is only supported with HACMP/ES in a RAC configuration.
When placing datafiles on GPFS no CRM (Concurrent Resource Manager) needs to be
installed. Starting with Oracle10g HACMP is no longer required to use GPFS.
Metalink contains certification information and information about required
patches for having a cluster database on a GPFS.
7 Sun GFS
———–
Global File Service (GFS or Cluster File System) is a filesystem that is
accessible from all nodes in the cluster. GFS is based on global devices and
has a client/server architecure. GFS provides transparent and concurrent file
access.
Note that Sun GFS is not supported for Oracle datafiles, see section 3.10.
8 Sun StorEdge QFS
——————–
QFS software is a file manager that provides a shared filesystem where mutiple
servers can read and write simultanuously to the same file in the same filesystem.
9 Other Linux Cluster Filesystems
———————————–
There are various third party cluster filesystems available on Linux.
Consult the Oracle Certify website for the policy regarding support for third party
cluster file systems on Linux. Also, consult the RAC Technology Compatibility Matrix (RTCM)
for Linux (http://www.oracle.com/tech… … generic_linux.html)
for the latest information on which third party cluster file systems are supported
by RAC release and platform.
10 Which Platforms support what?
———————————-
Platform and Storage for Storage for
[Cluster Software] Oracle installation datafiles
AIX [HACMP] LFS (1) or CFS (2) CFS and/or Raw devices
AIX [CRS] LFS or CFS CFS and/or Raw devices
HP/UX [MC/Service Guard] LFS or CFS (3) CFS (3) and/or Raw Devices
HP/UX PA-Risc [Veritas DBE/AC) LFS or CFS CFS and/or Raw Devices
Linux [oracm, CRS] LFS OCFS (4) and/or Raw
Devices, also NFS (5)
OpenVMS CFS CFS
Sun Solaris [Fujitsu Siemens LFS Raw Devices/NFS (5)
Primecluster]
Sun Solaris [Sun Cluster] LFS or CFS (6,7) CFS (7) Raw Devices/NFS (5)
Sun Solaris [Veritas DBE/AC] LFS or CFS CFS and/or Raw Devices
Tru64 Unix LFS or CFS CFS and/or Raw Devices
Windows NT/2000 [oracm, CRS] LFS or CFS OCFS and/or Raw Devices
Windows 2003 (32/64bit) [oracm, CRS] LFS or CFS OCFS and/or Raw Devices
(1) LFS is the abbreviation for local filesystem and is only accessible directly
by the node that mounted the disk
(2) CFS is the abbreviation for Cluster FileSystem. The implementation
depends on the operating software vendor or cluster software vendor.
(3) MC ServiceGuard 11.17 includes a CFS which is supported with Oracle 10gR2
(4) OCFS: Oracle Cluster FileSystem
(5) NFS is supported with Netapp(R) Filer, see Metalink certification
(6) Sun GFS can only be used for Oracle_Home and archivelogs.
(7) Sun StorEdge QFS
Local Filesystem means that the Oracle Universal Installer replicates the
RAC software installation automatically to every private filesystem of the
selected nodes in the cluster. The Oracle installation products
are cluster aware and will not install the Oracle software to over-write itself.
Oracm is the Oracle Cluster manager, which is available on Linux and Windows
NT/2000. No other cluster manager is needed to setup Real Application Cluster.
Cluster Ready Services (CRS) are new in Oracle10g and provide also clustermanager
functionality.
Oracle will validate cluster filesystems of other vendors when they become
available. Oracle will support the Oracle software when running on a validated
cluster filesystem.
11 Cluster File System names
——————————
PLatform or Cluster Vendor CFS name
AIX GPFS
HP/UX MC/ServiceGuard CFS
Linux [oracm, CRS] OCFS
OpenVMS RMS
Tru64 Unix CFS
SunCluster GFS, QFS
Veritas DBE/AC CFS
Windows NT/2000 OCFS
Windows 2003 (32/64bit) OCFS
For more information on certified configuration please see the certification
matrix available on Metalink. Instructions for accessing the certification
matrix can be found in the following note:
Note 184875.1
How To Check The Certification Matrix for Real Application Clusters
12 When to use CFS over raw?
——————————
This option is very dependent on the availability of a CFS on your platform.
A CFS offers:
- Simpler management
- Use of Oracle Managed Files with RAC
- Single Oracle Software installation
- Autoextend enabled on Oracle datafiles
- Uniform accessibility to archive logs in case of physical node failure
- With Oracle_Home on CFS, when you apply Oracle patches CFS guarantees that
the updated Oracle_Home is visible to all nodes in the cluster.