Storage solutions can generally be grouped into four categories: SoHo NAS systems, Cloud-based/object solutions, Enterprise NAS and SAN solutions, and Microsoft Storage Server solutions. Enterprise NAS and SAN solutions are generally closed systems offered by traditional vendors like EMC and NetApp with a very large price tag, so many businesses are looking at Open Source solutions to meet their needs. This is a collection of links and brief descriptions of Open Source storage solutions currently available. Open Source of course means it’s free to use and modify, however some projects have do commercially supported versions as well for enterprise customers who require it.
Why would an enterprise business consider an Open Source storage solution? The most obvious reason is that it’s free, and any developer can customize it to suit the needs of the business. With the right people on board, innovation can be rapid. Unfortunately, as is the case with most open source software, it can be needlessly complex and difficult to use, require expert or highly trained staff, have compatibility issues, and most don’t offer the support and maintenance that enterprise customers require. There’s no such thing as a free lunch, as they say, and using Open Source generally requires compromising on support and maintenance. I’d see some of these solutions as perfect for an enterprise development or test environment, and as an easy way for a larger company to allow their staff to get their feet wet in a new technology to see how it may be applied as a potential future solution. As I mentioned, tested and supported versions of some open source storage software is available, which can ease the concerns regarding deployment, maintenance and support.
I have the solutions loosely organized into Open Source NAS and SAN Software, File Systems, RAID, Backup and Synchronization, Cloud Storage, Data Desctruction, Distributed Storage/Big Data Tools, Document Management, and Encryption tools.
Open Source NAS and SAN Software Solutions
Backblaze is a object data storage provider. Backblaze stores data on its customized, open source hardware platform called Storage Pods, and its cloud-based Backblaze Vault file system. It is compatible with Windows and Apple OSes. While they are primarily an online backup service, they opened up their StoragePod design starting in 2009, which uses commodity hardware that anyone can build. They are self-contained 4U data storage servers. It’s interesting stuff and worth a look.
Enterprise Storage OS is a linux distribution based on the SCST project with the purpose of providing SCSI targets via a compatible SAN (Fibre Channel, InfiniBand, iSCSI, FCoE). ESOS can turn a server with the appropriate hardware into a disk array that sits on your enterprise Storage Area Network (SAN) and provides sharable block-level storage volumes.
OpenIOis an open source object storage startup founded in 2015 by CEO Laurent Denel and six co-founders. The product is an object storage system for applications that scales from terabytes to exabytes. OpenIO specializes in software defined storage and scalability challenges, with experience in designing and running cloud platforms. It owns a general purpose object storage and data processing solution adopted by large companies for massive production.
Open vStorage is an open-source, scale-out, reliable, high performance, software based storage platform which offers a block & file interface on top of a pool of drives. It is a virtual appliance (called the “Virtual Storage Router”) that is installed on a host or cluster of hosts on which Virtual Machines are running. It adds value and flexibility in a hyper converged / Open Stack provider deployment where you don’t necessarily want to be tied to a solution like VMware VSAN. Being hypervisor agnostic is a key advantage of Open vStorage.
OpenATTIC is an Open Source Ceph and storage management solution for Linux, with a strong focus on storage management in a datacenter environment. It allows for easy management of storage resources, it features a modern web interface, and supports NFS, CIFS, iSCSI and FS. It supports a wide range of file systems including Btrfs and ZFS, as well as automatic data replication using DRBD, the distributed replicated block device and automatic monitoring of shares and volumes using a built-in Nagios/Icinga instance. openATTIC 2 will support managing the Ceph distributed object store and file system.
OpenStack is a cloud operating system that controls large pools of compute, storage, and networking resources throughout a datacenter, all managed through a dashboard that gives administrators control while empowering their users to provision resources through a web interface.
The OpenStack Object Storage (swift) service provides software that stores and retrieves data over HTTP. Objects (blobs of data) are stored in an organizational hierarchy that offers anonymous read-only access, ACL defined access, or even temporary access. Object Storage supports multiple token-based authentication mechanisms implemented via middleware.
CryptoNAS (formerly CryptoBox) is one NAS project that makes encrypting your storage quick and easy. It is a multilingual Debian based Linux live CD with a web based front end that can be installed into a hard disk or USB stick. CryptoNAS has various choices of encryption algorithms, the default is AES, it encrypts disk partitions using LUKS (Linux Unified Key setup) which means that any Linux operating system can also access them without using CryptoNAS software.
Ceph is a distributed object store and file system designed to provide high performance, reliability and scalability. It’s built on the Reliable Autonomic Distributed Object Store (RADOS) and allows enterprises to build their own economical storage devices using commodity hardware. It has been maintained by RedHat since their acquisition of InkTank in April 2014. It’s capable of block, object, and file storage. It is scale-out, meaning multiple Ceph storage nodes will present a single storage system that easily handles many petabytes, and performance and capacity increase simultaneously. Ceph has many basic enterprise storage features including replication (or erasure coding), snapshots, thin provisioning, auto-tiering and self-healing capabilities.
The FreeNAS website touts itself as “the most potent and rock-solid open source NAS software,” and it counts the United Nations, The Salvation Army, The University of Florida, the Department of Homeland Security, Dr. Phil, Reuters, Michigan State University and Disney among its users. You can use it to turn standard hardware into a BSD-based NAS device, or you can purchase supported, pre-configured TrueNAS appliances based on the same software.
RockStor is a free and open source NAS (Network Attached Storage) solution. It’s Personal Cloud Server is a powerful local alternative to public cloud storage that mitigates the cost and risks of public cloud storage. This NAS and cloud storage platform is suitable for small to medium businesses and home users who don’t have much IT experience, but who may need to scale to terabytes of data storage. If you are more interested in Linux and Btrfs, it’s a great alternative to FreeNAS. The RockStor NAS and cloud storage platform can be managed within a LAN or over the Web using a simple and intuitive UI, and with the inclusion of add-ons (fittingly named ‘Rockons’), you can extend the feature set of your Rockstor to include new apps, servers, and services.
Red Hat-owned Gluster is a distributed scale-out network attached storaage file system that can handle really big data—up to 72 brontobytes. It has found applications including cloud computing, streaming media services and content delivery networks. It promises high availability and performance, an elastic hash algortithm, an elastic volume manager and more. GlusterFS aggregates various storage servers over Ethernet or Infiniband RDMA interconnect into one large parallel network file system.
72 Brontobytes? I admit that I hadn’t seen that term used yet in any major storage vendor’s marketing materials. How big is that? Really, really big.
1 Bit = Binary Digit 8 Bits = 1 Byte 1,000 Bytes = 1 Kilobyte 1,000 Kilobytes = 1 Megabyte 1,000 Megabytes = 1 Gigabyte 1,000 Gigabytes = 1 Terabyte 1,000 Terabytes = 1 Petabyte 1,000 Petabytes = 1 Exabyte 1,000 Exabytes = 1 Zettabyte 1,000 Zettabytes = 1 Yottabyte 1,000 Yottabytes = 1 Brontobyte 1,000 Brontobytes = 1 Geopbyte
Like FreeNAS, NAS4Free allows you to create your own BSD-based storage solution from commodity hardware. It promises a low-cost, powerful network storage appliance that users can customize to their own needs.
If FreeNAS and NAS4Free sound suspiciously similar, it’s because they share a common history. Both started from the same original FreeNAS code, which was created in 2005. In 2009, the FreeNAS team pursued a more extensible plugin architecture using OpenZFS, and a project lead who disagreed with that direction departed to continue his work using Linux, thus creating NAS4Free. NAS4Free dispenses with the fancy stuff and sticks with a more focused approach of “do one thing and do it well”. You don’t get bittorrent clients or cloud servers and you can’t make a virtual server with it, but many feel that NAS4Free has a much cleaner, more usable interface.
Openfiler is a storage management operating system based on rPath Linux. It is a full-fledged NAS/SAN that can be implemented as a virtual appliance for VMware and Xen hypervisors. It offers storage administrators a set of powerful tools that are used to manage complex storage environments. It supports software and hardware RAID, monitoring and alerting facilities, volume snapshot and recovery features. Configuring Openfiler can be complicated, but there are many online resources available that cover the most typical installations. I’ve seen mixed reviews about the product online, it’s worth a bit of research before you consider an implementation.
OpenSMT is an open source storage management toolkit based on opensolaris. Like Openfiler, OpenSMT also allows users to use commodity hardware for a dedicated storage device with NAS features and SAN features. It uses the ZFS filesystem and includes a well-designed Web GUI.
This NAS solution is based on Debian Linux and offers plug-ins to extend it’s capabilities. It boasts really easy-to-use storage management with a web based interface, fast setup, Multilanguage support, volume management, monitoring, UPS support, and statistics reporting. Plugins allow it to be extended with LDAP support, bittorrent, and iSCSI. It is primarily designed to be used in small offices or home offices, but is not limited to those scenarios.
The Turnkey Linux Virtual Appliance Library is a free open source project which has developed a range of Debian based pre-packaged server software appliances (a.k.a. virtual appliances). Turnkey appliances can be deployed as a virtual machine (a range of hypervisors are supported), in cloud computing infrastructures (including AWS and others) or installed in physical computers.
Turnkey offers more than 100 different software appliances based on open source software. Among them is a file server that offers simple network attached storage, hence it’s inclusion in this list.
Turnkey file server is an easy to use file server that combines Windows-compatible network file sharing with a web based file manager. TurnKey File Server includes support for SMB, SFTP, NFS, WebDAV and rsync file transfer protocols. The server is configured to allow server users to manage files in private or public storage. It is based on Samba and SambaDAV.
oVirt is free, open-source virtualization management platform. It was founded by Red Hat as a community project on which Red Hat Enterprise Virtualization is based. It allows centralized management of virtual machines, compute, storage and networking resources, from an easy to use web-based front-end with platform independent access. With oVirt, IT can manage virtual machines, virtualized networks and virtualized storage via an intuitive Web interface. It’s based on the KVM hypervisor.
Backed by companies like EMC, Seagate, Toshiba, Cisco, NetApp, Red Hat, Western Digital, Dell and others, Kinetic is a Linux Foundation project dedicated to establishing standards for a new kind of object storage architecture. It’s designed to meet the need for scale-out storage for unstructured data. Kinetic is fundamentally a way for storage applications to communicate directly with storage devices over Ethernet. With Kinetic, storage use cases that are targeted consist largely of unstructured data like NoSQL, Hadoop and other distributed file systems, and object stores in the cloud like Amazon S3, OpenStack Swift and Basho’s Riak.
Storj (pronounced “Storage”) is a new type of cloud storage built on blockchain and peer-to-peer technology. Storj offers decentralized, end-to-end encrypted cloud storage. The DriveShare app allows users to rent out their unused hard drive space for use by the service, and the MetaDisk Web app allows users to save their files to the service securely.
The core protocol allows for peer to peer negotiation and verification of storage contracts. Providers of storage are called “farmers” and those using the storage, “renters”. Renters periodically audit whether the farmers are still keeping their files safe and, in a clever twist of similar architectures, immediately pay out a small amount of cryptocurrency for each successful audit. Conversely, farmers can decide to stop storing a file if its owner does not audit and pay their services on time. Files are cut up into pieces called “shards” and stored 3 times redundantly by default. The network will automatically determine a new farmer and move data if copies become unavailable. In the core protocol, contracts are negotiated through a completely decentralized key-value store (Kademlia). The system puts measures in place that prevent farmers and renters from cheating on each other, e.g. through manipulation of the auditing process. Other measures are taken to prevent attacks on the protocol itself.
Storj, like other similar services, offers several advantages over more traditional cloud storage solutions: since data is encrypted and cut into “shards” at source, there is almost no conceivable way for unauthorized third parties to access that data. Data storage is naturally distributed and this, in turn, increases availability and download speed thanks to the use of multiple parallel connections.
Open Source File Systems
Btrfs is a newer Linux filesystem being developed by Facebook, Fujitsu, Intel, the Linux Foundation, Novell, Oracle, Red Hat and some other organizations. It emphasizes fault tolerance and easy administration, and it supports files as large as 16 EiB.
It has been included in the Linux 3.10 kernel as a stable filesystem since July 2014. Because of the fast development speed, btrfs noticeably improves with every new kernel version, so it’s always recommended to use the most recent, stable kernel version you can. Rockstor always runs a very recent kernel for that reason.
One of the big draws of Btrfs is its Copy on Write (CoW) nature of the filesystem. When multiple users attempt to read/write a file, it does not make a separate copy until changes are made to the original file by the user. This has the benefit of saving changes, which allows file restorations with snaps. Btrfs also has its own native RAID support built in, appropriately named Btrfs-RAID. A nice benefit the Btrfs RAID iimplemenation is that a RAID6 volume does not need additional re-syncing upon creation of the RAID set, greatly reducing the time requirement.
This is the latest version of one of the most popular filesystems for Linux. One of its key benefits is the ability to handle very large amounts of data— 16 TB maximum per file and 1 EB (exabyte, or 1 million terabytes) maximum per filesystem. It is the evolution of the most used Linux filesystem, Ext3. In many ways, Ext4 is a deeper improvement over Ext3 than Ext3 was over Ext2. Ext3 was mostly about adding journaling to Ext2, but Ext4 modifies important data structures of the filesystem such as the ones destined to store the file data.
Owned by RedHat, GlusterFS is a scale-out distributed file system designed to handle petabytes worth of data. Features include high availability, fast performance, global namespace, elastic hash algorithm and an elastic volume manager.
GlusterFS combines the unused storage space on multiple servers to create a single, large, virtual drive that you can mount like a legacy filesystem using NFS or FUSE on a client PC. It also provides the ability to add more servers or remove existing servers from the storage pool on the fly. GlusterFS functions like a “network RAID” device, many RAID concepts are apparent during setup. It really shines when you need to store huge quantities of data, have redundant file storage, or write data very quickly for later access. Geo-replication lets you mirror data on a volume across the wire. The target can be a single directory or another GlusterFS volume. It can also handle multiple petabytes easily along with being very easy to install and manage.
Designed for “the world’s largest and most complex computing environments,” Lustre is a high-performance scale-out file system. It boasts that it can handle tens of thousands of nodes and petabytes of data with very fast throughput.
Lustre file systems are highly scalable and can be part of multiple computer clusters with tens of thousands of client nodes, multiple petabytes of storage on hundreds of servers, and more than 1TB/s of aggregate I/O throughput. This makes Lustre file systems a popular choice for businesses with large data centers.
OpenZFS is an outstanding storage platform that encompasses the functionality of traditional filesystems, volume managers, and more, with consistent reliability, functionality and performance. This popular file system is incorporated into many other open source storage projects. It offers excellent scalability and data integrity, and it’s available for most Linux distributions.
IPFS is short for “Interplanetary File System,” and is an unusual project that uses peer-to-peer technology to connect all computers with a single file system. It aims to supplement, or possibly even replace, the Hypertext Transfer Protocol that runs the web now. According to the project owner, “In some ways, IPFS is similar to the Web, but IPFS could be seen as a single BitTorrent swarm, exchanging objects within one Git repository.”
IPFS isn’t exactly a well-known technology yet, even among many in the Valley, but it’s quickly spreading by word of mouth among folks in the open-source community. Many are excited by its potential to greatly improve file transfer and streaming speeds across the Internet.
Open Source RAID Solutions
DRBD is a distributed replicated storage system for the Linux platform. It is implemented as a kernel driver, several userspace management applications and some shell scripts. It is typically used in high availability (HA) computer clusters, but beginning with v9 it can also be used to create larger software defined storage pools with more of a focus on cloud integration. Support and training are available through the project owner, LinBit.
DRBD’s replication technology is very fast and efficient. If you can live with an active-passive setup, DRBD is an efficient storage replication solution. DRBD helps keep data synchronized between multiple nodes and multiple nodes in different datacenters, and if you need to failover between two nodes DRBD is very fast and efficient.
This piece of the Linux kernel makes it possible to set up and manage your own software RAID array using standard hardware. While it is terminal-based, but it offers a wide variety of options for monitoring, reporting, and managing RAID arrays.
Raider applies RAID 1, 4, 5, 6 or 10 to hard drives. It is able to convert a single linux system disk in to a software raid 1, 4, 5, 6 or 10 system in a two-pass simple command. Raider is a bash shell script, that deals with specific oddities of several linux distros (Ubuntu, Debian, Arch, Mandriva, Mageia, openSuSE, Fedora, Centos, PCLinuxOS, Linux Mint, Scientific Linux, Gentoo, Slackware… – see README) and uses linux software raid (mdadm) ( http://en.wikipedia.org/wiki/Mdadm and https://raid.wiki.kernel.org/ ) to execute the conversion.
Open Source Backup and Synchronization Solutions
From their marketing staff… “Zmanda is the world’s leading provider of open source backup and recovery software. Our open source development and distribution model enables us to deliver the highest quality backup software such as Amanda Enterprise and Zmanda Recovery Manager for MySQL at a fraction of the cost of software from proprietary vendors. Our simple-to-use yet feature-rich backup software is complemented by top-notch services and support expected by enterprise customers.”
Zmanda offers a community and enterprise edition of their software. The enterprise edition of course offers a much more complete feature set.
The core of Amanda is the Amanda server, which handles all the backup operations, compression, indexing and configuration tasks. You can run it on any Linux server as it doesn’t cause any conflicts with any other processes, but it is recommend to run it on a dedicated machine as that removes any associated processing loads from the client machines and prevents the backup from negatively affecting the client’s performance.
Overall it is an extremely capable file-level backup tool that can be customized to your exact requirements. While it lacks a GUI, the command line controls are simple and the level of control you have over your backups is exceptional. Because it can be called from within your own scripts, it can be incorporated into your own custom backup scheme no matter how complex your requirements are. Paid support and a cloud-based version are available through Zmanda, which is owned by Carbonite.
Areca Backup is a free backup utility for Windows and Linux. It is written in Java and released under the GNU General Public License. It’s a good option for backing up a single system and it aims to be simple and versatile. Key features include compression, encryption, filters and support for delta backup.
Backup is a system utility for Linux and Mac OS X, distributed as a RubyGem, that allows you to easily perform backup operations. It provides an elegant DSL in Ruby for modeling your backups. Backup has built-in support for various databases, storage protocols/services, syncers, compressors, encryptors and notifiers which you can mix and match. It was built with modularity, extensibility and simplicity in mind.
Designed for enterprise users, BackupPC claims to be “highly configurable and easy to install and maintain.” It backs up to disk only (not tape) and offers features that reduce storage capacity and IO requirements.
Another enterprise-grade open source back solution, Bacula offers a number of advanced features for backup and recovery, as well as a fairly easy-to-use interface. Commercial support, training and services are available through Bacula Systems.
Similar to FlyBack (see below), Back in Time offers a very easy-to-configure snapshot backup solution. GUIs are available for both Gnome and KDE (4.1 or greater).
This tool makes it easier to coordinate and manage backups on your network. With the help of programs like rdiff-backup, duplicity, mysqlhotcopy and mysqldump, Backupninja offers common backup features such as remote, secure and incremental file system backups, encrypted backup, and MySQL/MariaDB database backup. You can selectively enable status email reports, and can back up general hardware and system information as well. One key strength of backupninja is a built-in console-based wizard (called ninjahelper) that allows you to easily create configuration files for various backup scenarios. The downside is that backupninja requires other “helper” programs to be installed in order to take full advantage of all its features. While backupninja’s RPM package is available for Red Hat-based distributions, backupninja’s dependencies are optimized for Debian and its derivatives. Thus it is not recommended to try backupninja for Red Hat based systems.
Short for “Backup Archiving Recovery Open Sourced,” Bareos is a 100% open source fork of the backup project from bacula.org. The fork is in development since late 2010, it has a lot of new features. The source has been published on github, licensed AGPLv3. It offers features like LTO hardware encryption, efficient bandwidth usage and practical console commands. A commercially supported version of the same software is available through Bareos.com.
Box Backup describes itself as “an open source, completely automatic, online backup system.” It creates backups continuously and can support RAID. Box Backup is stable but not yet feature complete. All of the facilities to maintain reliable encrypted backups and to allow clients to recover data are, however, already implemented and stable.
BURP, which stands for “BackUp And Restore Program,” is a network backup tool based on librsync and VSS. It’s designed to be easy to configure and to work well with disk storage. It attempts to reduce network traffic and the amount of space that is used by each backup.
Conceived as a replacement for True Image or Norton Ghost, Clonezilla is a disk imaging application that can do system deployments as well as bare metal backup and recovery. Two types of Clonezilla are available, Clonezilla live and Clonezilla SE (server edition). Clonezilla live is suitable for single machine backup and restore. While Clonezilla SE is for massive deployment, it can clone many (40+) computers simultaneously. Clonezilla saves and restores only used blocks in the hard disk. This increases the clone efficiency. With some high-end hardware in a 42-node cluster, a multicast restoring at rate 8 GB/min was reported.
Create Synchronicity’s claim to fame is its lightweight size—just 220KB. It’s also very fast, and it offers an intuitive interface for backing up standalone systems. Create Synchronicity is an easy, fast and powerful backup application. It synchronizes files and folders, has a nice interface, and can schedule backups to keep your data safe. Plus, it’s open source, portable, multilingual, and very light (180kB). Windows 2000, Windows XP, Windows Vista, and Windows Seven are supported. To run Create Synchronicity, you must install the .Net Framework, version 2.0 or later.
AR is a command-line backup and archiving tool that uses selective compression (not compressing already compressed files), strong encryption, may split an archive in different files of given size and provides on-fly hashing. DAR knows how to perform full, differential, incremental and decremental backups. It provides testing, diffing, merging, listing and of course data extracting from existing archives. Archive internal’s catalog, allows very quick restoration of a even a single file from a very large, eventually sliced, compressed and encrypted archive. Dar saves *all* UNIX inode types, takes care of hard links, sparse files as well as Extended Attributes (MacOS X file forks, Linux ACL, SELinux tags, user attributes), it has support for ssh and is suitable for tapes and disks (floppy, CD, DVD, hard disks, …). An optional GUI is available from the DarGUI project.
DirSync Pro is a small, but powerful utility for file and folder synchronization. DirSync Pro can be used to synchronize the content of one or many folders recursively. Use DirSync Pro to easily synchronize files from your desktop PC to your USB-stick (/Externa HD/PDA/Notebook). Use this USB-stick (/Externa HD/PDA/Notebook) to synchronize files to another desktop PC. It also features incremental backups, a user friendly interface, a powerful schedule engine, and real-time synchronization. It is written in Java.
Duplicati is designed to backup your network to a cloud computing service like Amazon S3, Microsoft OneDrive, Google Cloud or Rackspace. It includes AES-256 encryption and a scheduler, as well as features like filters, deletion rules, transfer and bandwidth options. Save space with incremental backups and data deduplication. Run backups on any machine through the web-based interface or via command line interface. It has an auto-updater.
Based on the librsync library, Duplicity creates encrypted archives and uploads them to remote or local servers. It can use GnuPG to encrypt and sign archives if desired.
Duplicity backs directories by producing encrypted tar-format volumes and uploading them to a remote or local file server. Because duplicity uses librsync, the incremental archives are space efficient and only record the parts of files that have changed since the last backup. Because duplicity uses GnuPG to encrypt and/or sign these archives, they will be safe from spying and/or modification by the server.
The duplicity package also includes the rdiffdir utility. Rdiffdir is an extension of librsync’s rdiff to directories—it can be used to produce signatures and deltas of directories as well as regular files. These signatures and deltas are in GNU tar format.
Similar to Apple’s TimeMachine, FlyBack provides incremental backup capabilities and allows users to recover their systems from any previous time. The interface is very easy to use, but little customization is available. FlyBack creates incremental backups of files, which can be restored at a later date. FlyBack presents a chronological view of a file system, allowing individual files or directories to be previewed or retrieved one at a time. Flyback was originally based on rsync when the project began in 2007, but in October 2009 it was rewritten from scratch using Git.
An imaging and cloning solution, FOG makes it easy for administrators to backup networks of all sizes. FOG can be used to image Windows XP, Vista, Windows 7 and Window 8 PCs using PXE, PartClone, and a Web GUI to tie it together. Includes featues like memory and disk test, disk wipe, av scan & task scheduling.
FreeFileSync is a free Open Source software that helps you synchronize files and synchronize folders for Windows, Linux and Mac OS X. It is designed to save your time setting up and running data backups while having nice visual feedback along the way. This file and folder synchronization tool can be very useful for backup purposes. It can save a lot of time and receives very good reviews from its users.
FullSync is a powerful tool that helps you keep multiple copies of various data in sync. I.e. it can update your Website using (S)Ftp, backup your data or refresh a working copy from a remote server. It offers flexible rules, a scheduler and more. Built for developers, FullSync offers synchronization capabilities suitable for backup purposes or for publishing Web pages. Features include multiple modes, flexible tools, support for multiple file transfer protocols and more.
Grsync provides a graphical interface for rsync, a popular command line synchronization and backup tool. It’s useful for backup, mirroring, replication of partitions, etc. It’s a hack/port of Piero Orsoni’s wonderful Grsync – rsync frontend in GTK – to Windows (win32).
Award-winning LuckyBackup offers simple, fast backup. Note that while it is available in a Windows version, it’s still under development. It features Backup using snapshots, Various checks to keep data safe, Simulation mode, Remote connections, Easy restore procedure, Add/remove any rsync option, Synchronize folders, Exclude data from tasks, Execute other commands before or after a task, Scheduling, Tray notification support, and e-mail reports.
Mondo Rescue is a GPL disaster recovery solution. It supports Linux (i386, x86_64, ia64) and FreeBSD (i386). It’s packaged for multiple distributions (Fedora, RHEL, openSuSE, SLES, Mandriva, Mageia, Debian, Ubuntu, Gentoo). It supports tapes, disks, network and CD/DVD as backup media, multiple filesystems, LVM, software and hardware Raid, BIOS and UEFI.
Winner of the most original name for backup software – “OBligatory NAMe”. This app performs snapshot backups that can be stored on local disks or online storage services. Features include Easy usage, Snapshot backups, Data de-duplication, across files, and backup generations, Encrypted backups, and it supports both PUSH (i.e. Run on the client) and PULL (i.e. Run on the server) methods.
Partimage is opensource disk backup software. It saves partitions having a supported filesystem on a sector basis to an image file. Although it runs under Linux, Windows and most Linux filesystems are supported. The image file can be compressed to save disk space and transfer time and can be split into multiple files to be copied to CDs or DVDs. Partitions can be saved across the network using the partimage network support, or using Samba / NFS (Network File Systems). This provides the ability to perform an hard disk partition recovery after a disk crash. Partimage can be run as part of your normal system or as a stand-alone from the live SystemRescueCd. This is helpful when the operating system cannot be started. SystemRescueCd comes with most of the data recovery software for linux that you may need .
Partimage will only copy data from the used portions of the partition. (This is why it only works for supported filesystem. For speed and efficiency, free blocks are not written to the image file. This is unlike other commands, which also copy unused blocks. Since the partition is processed on a sequential sector basis disk transfer time is maximized and seek time is minimized, Partimage also works for very full partitions. For example, a full 1 GB partition may be compressed down to 400MB.
Easy rescue system with GUI tools for full system backup, bare metal recovery, partition editing, recovering deleted files, data protection, web browsing, and more. Uses partclone (like Clonezilla) with a UI like Ghost or Acronis. Runs from CD/USB.
Rsnapshot is a filesystem snapshot utility for making backups of local and remote systems. Using rsync and hard links, it is possible to keep multiple, full backups instantly available. The disk space required is just a little more than the space of one full backup, plus incrementals. Depending on your configuration, it is quite possible to set up in just a few minutes. Files can be restored by the users who own them, without the root user getting involved. There are no tapes to change, so once it’s set up, you may never need to think about it again. rsnapshot is written entirely in Perl. It should work on any reasonably modern UNIX compatible OS, including: Debian, Redhat, Fedora, SuSE, Gentoo, Slackware, FreeBSD, OpenBSD, NetBSD, Solaris, Mac OS X, and even IRIX.
Rsync is a fast and extraordinarily versatile file copying tool for both remote and local files. Rsync uses a delta-transfer algorithm which provides a very fast method for bringing remote files into sync. It does this by sending just the differences in the files across the link, without requiring that both sets of files are present at one of the ends of the link beforehand. At first glance this may seem impossible because the calculation of diffs between two files normally requires local access to both files.
SafeKeep is a centralized and easy to use backup application that combines the best features of a mirror and an incremental backup. It sets up the appropriate environment for compatible backup packages and simplifies the process of running them. For Linux users only, SafeKeep focuses on security and simplicity. It’s a command line tool that is a good option for a smaller environment.
This application allows you to keep your files and folders updated and synchronized. Key features include an easy to use interface, blacklisting, analysis and restore. It is also cross-platform.
Synbak is an software designed to unify several backup methods. Synbak provides a powerful reporting system and a very simple interface for configuration files. Synbak is a wrapper for several existing backup programs suppling the end user with common method for configuration that will manage the execution logic for every single backup and will give detailed reports of backups result. Synbak can make backups using RSync over ssh, rsync daemon, smb and cifs protocols (using internal automount functions), Tar archives (tar, tar.gz and tar.bz2), Tape devices (using multi loader changer tapes too), LDAP databases, MySQL databases, Oracle databases, CD-RW/DVD-RW, Wget to mirror HTTP/FTP servers. It offers official support to GNU/Linux Red Hat Enterprise Linux and Fedora Core Distributions only.
Designed to be as easy to use as possible, SnapBackup backs up files with just one click. It can copy files to a flash drive, external hard drive or the cloud, and it includes compression capabilities. The first time you run Snap Backup, you configure where your data files reside and where to create backup files. Snap Backup will also copy your backup to an archive location, such as a USB flash drive (memory stick), external hard drive, or cloud backup. Snap Backup automatically puts the current date in the backup file name, alleviating you from the tedious task of renaming your backup file every time you backup. The backup file is a single compressed file that can be read by zip programs such as gzip, 7-Zip, The Unarchiver, and Mac’s built-in Archive Utility.
File synchronization and backup software. Back up data and synchronize PCs, Macs, servers, notebooks, and online storage space. You can set up as many different jobs as you need and run them manually or using the scheduler. Syncovery works with local hard drives, network drives and any other mounted volumes. In addition, it comes with support for FTP, SSH, HTTP, WebDAV, Amazon S3, Google Drive, Microsoft Azure, SugarSync, box.net and many other cloud storage providers. You can use ZIP compression and data encryption. On Windows, the scheduler can run as a service – without users having to log on. There are powerful synchronization modes, including Standard Copying, Exact Mirror, and SmartTracking. Syncovery features a well designed GUI to make it an extremely versatile synchronizing and backup tool.
XSIbackup can backup VMwareESXi environments version 5.1 or greater. It’s a command line tool with a scheduler, and it runs directly on the hypervisor. XSIBackup is a free alternative to commercial software like Veeam Backup.
A client-server system, UrBackup does both file and image backups. UrBackup is an easy to setup Open Source client/server backup system, that through a combination of image and file backups accomplishes both data safety and a fast restoration time. File and image backups are made while the system is running without interrupting current processes. UrBackup also continuously watches folders you want backed up in order to quickly find differences to previous backups. Because of that, incremental file backups are really fast. Your files can be restored through the web interface, via the client or the Windows Explorer while the backups of drive volumes can be restored with a bootable CD or USB-Stick (bare metal restore). A web interface makes setting up your own backup server easy.
This file synchronization tool goes beyond the capabilities of most backup systems, because it can reconcile several slightly different copies of the same file stored in different places. It can work between any two (or more) computers connected to the Internet, even if they don’t have the same operating system. It allows two replicas of a collection of files and directories to be stored on different hosts (or different disks on the same host), modified separately, and then brought up to date by propagating the changes in each replica to the other.
Unison shares a number of features with tools such as configuration management packages (CVS, PRCS, Subversion, BitKeeper, etc.), distributed filesystems (Coda, etc.), uni-directional mirroring utilities (rsync, etc.), and other synchronizers (Intellisync, Reconcile, etc). Unison runs on both Windows and many flavors of Unix (Solaris, Linux, OS X, etc.) systems. Moreover, Unison works across platforms, allowing you to synchronize a Windows laptop with a Unix server, for example. Unlike simple mirroring or backup utilities, Unison can deal with updates to both replicas of a distributed directory structure. Updates that do not conflict are propagated automatically. Conflicting updates are detected and displayed.
This program is designed to write a raw disk image to a removable device or backup a removable device to a raw image file. It is very useful for embedded development, namely Arm development projects (Android, Ubuntu on Arm, etc). Averaging more than 50,000 downloads every week, this tool is a very popular way to copy a disk image to a new machine. It’s very useful for systems administrators and developers.
Open Source Cloud Data Storage Solutions
Camlistore is short for “Content-Addressable Multi-Layer Indexed Storage.” Camlistore is a set of open source formats, protocols, and software for modeling, storing, searching, sharing and synchronizing data in the post-PC era. Data may be files or objects, tweets or 5TB videos, and you can access it via a phone, browser or FUSE filesystem. It is still under active development. If you’re a programmer or fairly technical, you can probably get it up and running and get some utility out of it. Many bits and pieces are actively being developed, so be prepared for bugs and unfinished features.
Apache’s CloudStack project offers a complete cloud computing solution, including cloud storage. Key storage features include tiering, block storage volumes and support for most storage hardware.
CloudStack is open source software designed to deploy and manage large networks of virtual machines, as a highly available, highly scalable Infrastructure as a Service (IaaS) cloud computing platform. CloudStack is used by a number of service providers to offer public cloud services, and by many companies to provide an on-premises (private) cloud offering, or as part of a hybrid cloud solution.
CloudStack is a turnkey solution that includes the entire “stack” of features most organizations want with an IaaS cloud: compute orchestration, Network-as-a-Service, user and account management, a full and open native API, resource accounting, and a first-class User Interface (UI). It currently supports the most popular hypervisors: VMware, KVM, Citrix XenServer, Xen Cloud Platform (XCP), Oracle VM server and Microsoft Hyper-V.
CloudStore synchronizes files between multiple locations. It is similar to Dropbox, but it’s completely free and, as noted by the developer, does not require the user to trust a US company.
Cozy is a personal cloud solution allows users to “host, hack and delete” their own files. It stores calendar and contact information in addition to documents, and it also has an app store with compatible applications.
Designed for Amazon Web Services users, DREBS stands for “Disaster Recovery for Elastic Block Store.” It runs on Amazon’s EC2 services and takes snapshots of EBS volumes for disaster recovery purposes. It can be used for taking periodic snapshots of EBS volumes. It is designed to be run on the EC2 host which the EBS volumes to be snapshoted are attached.
DuraCloud is a hosted service and open technology developed by DuraSpace that makes it easy for organizations and end users to use cloud services. DuraCloud leverages existing cloud infrastructure to enable durability and access to digital content. It is particularly focused on providing preservation support services and access services for academic libraries, academic research centers, and other cultural heritage organizations. The service builds on the pure storage from expert storage providers by overlaying the access functionality and preservation support tools that are essential to ensuring long-term access and durability. DuraCloud offers cloud storage across multiple commercial and non commercial providers, and offers compute services that are key to unlocking the value of digital content stored in the cloud. DuraCloud provides services that enable digital preservation, data access, transformation, and data sharing. Customers are offered “elastic capacity” coupled with a “pay as you go” approach. DuraCloud is appropriate for individuals, single institutions, or for multiple organizations that want to use cross-institutional infrastructure. DuraCloud became available as a limited pilot in 2009 and was released broadly as a service of the DuraSpace not-for-profit organization in 2011.
This app allows users to set up cloud-based storage services on their own servers. It supports FTP, SFTP or FTPS file syncing.
Pydio is the mature open source alternative to dropbox and box, for the enterprise. Formerly known as AjaXplorer, this app helps enterprises set a file-sharing service on their own servers. It’s very easy to install and offers an attractive, intuitive interface.
With Seafile you can set up your own private cloud storage server or use their hosted service that is free for up to 1GB. Seafile is an open source cloud storage system with privacy protection and teamwork features. Collections of files are called libraries. Each library can be synced separately. A library can also be encrypted with a user chosen password. Seafile also allows users to create groups and easily sharing files into groups.
Another self-hosted cloud storage solution, SparkleShare is a good storage option for files that change often and are accessed by a lot of people. (It’s not as good for complete backups.) Because it was built for developers, it also includes Git. SparkleShare is open-source client software that provides cloud storage and file synchronization services. By default, it uses Git as a storage backend. SparkleShare is comparable to Dropbox, but the cloud storage can be provided by the user’s own server, or a hosted solution such as GitHub. The advantage of self-hosting is that the user retains absolute control over their own data. In the simplest case, self-hosting only requires SSH and Git.
Syncany is a cloud storage and filesharing application with a focus on security and abstraction of storage. It is similar to Dropbox, but you can use it with your own server or one of the popular public cloud services like Amazon, Google or Rackspace. It encrypts files locally, adding security for sensitive files.
Syncthing was designed to be a secure and private alternative to public cloud backup and synchronization services. It is a continuous file synchronization program. It synchronizes files between two or more computers. It offers strong encryption and authentication capabilities and includes an easy-to-use GUI.
PerlShare is another Dropbox alternative, allowing users to set up their own cloud storage servers. Windows and OS X support is under development, but it works on Linux today.
SeaFile offers open source cloud storage and file synchronization. You can self-host with the free community or paid professional editions, or you can pay for the service hosting.
Storage Management / SDS
Advanced OpenSDS API’s enables enterprise storage features to be fully utilized by OpenStack. For End-Users. OpenSDS offers free choice and allows you to choose solutions from different vendors. Start transforming your IT infrastructure into a platform for cloud-native workloads and accelerate new business rollouts.
CoprHD is an open source software defined storage controller and API platform by Dell EMC. It enables policy-based management and cloud automation of storage resources for block, object and file storage providers.
REX-Ray is a Dell EMC open source project. It’s a container storage orchestration engine enabling persistence for cloud native workloads. New updates and features contribute to enterprise readiness, as {code} by Dell EMC through REX-Ray and libStorage works with industry organizations to ensure long-lasting interoperability of storage in Cloud Native through a universal Container Storage Interface.
From their website: “Nexenta is the global leader in Open Source-driven Software-Defined Storage – what we call Open Software-Defined Storage (OpenSDS).We uniquely integrate software-only “Open Source” collaboration with commodity hardware-centric “Software-Defined Storage” (SDS) innovation.”
libvirt is an open source API, daemon and management tool for managing platform virtualization.[3] It can be used to manage KVM, Xen, VMware ESX, QEMU and other virtualization technologies. These APIs are widely used in the orchestration layer of hypervisors in the development of a cloud-based solution.
Online Hierarchical Storage Manager (OHSM) is the first attempt towards an enterprise level open source data storage manager which automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as hard disk drive arrays, are more expensive (per byte stored) than slower devices, such as optical discs and magnetic tape drives. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations. Instead, HSM systems store the bulk of the enterprise’s data on slower devices, and then copy data to faster disk drives when needed. In effect, OHSM turns the fast disk drives into caches for the slower mass storage devices. There would be certain policies that would be set by the data center administrators as to which data can safely be moved to slower devices and which data should stay on the fast devices. Under manual circumstances the data centers suffers from down time and also change in the namespace. Policy rules specify both initial allocation destinations and relocation destinations as priority-ordered lists of placement classes. Files are allocated in the first placement class in the list if free space permits, in the second class if no free space is available in the first, and so forth.
Open Source Data Destruction Solutions
With BleachBit you can free cache, delete cookies, clear Internet history, shred temporary files, delete logs, and discard junk you didn’t know was there. Designed for Linux and Windows systems, it wipes clean thousands of applications including Firefox, Internet Explorer, Adobe Flash, Google Chrome, Opera, Safari,and more. Beyond simply deleting files, BleachBit includes advanced features such as shredding files to prevent recovery, wiping free disk space to hide traces of files deleted by other applications, and vacuuming Firefox to make it faster.
Darik’s Boot and Nuke (“DBAN”) is a self-contained boot image that securely wipes the hard disks of most computers. DBAN is appropriate for bulk or emergency data destruction. This app can securely wipe an entire disk so that the data cannot be recovered. The owner of the app, Blancco, also offers related paid products, including some that support RAID.
Eraser is a secure data removal tool for Windows. It completely removes sensitive data from your hard drive by overwriting it several times with carefully selected patterns. It erases residue from deleted files, erases MFT and MFT-resident files (for NTFS volumes) and Directory Indices (for FAT), and has a powerful and flexible scheduler.
FileKiller is another option for secure file deletion. It allows the user to determine how many times deleted data is overwritten depending on the sensitivity of the data being deleted. It offers fast performance and can handle large files.
It features High Performance, the ability to choose the number of overwrite iterations (1 to 100), the ability to choose overwrite method using blanks, the ability to choose overwrite method using random data, the ability to choose overwrite method using a user defined ascii character, data as well as Filename deletion. No setup is needed, you get just a single executable, and it it requires .net 3.5.
Open Source Distributed Storage/Big Data Solutions
Big data describes itself as an ultra high-performance graph database supporting the RDF data model. It can scale to 50 billion edges on a single machine. Paid Commercial support is available for this product.
The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. This project is so well known that it has become nearly synonymous with big data.
HPCC Systems (High Performance Computing Cluster) is an open source, massive parallel-processing computing platform for big data processing and analytics. It is Intended as an alternative to Hadoop. It is a distributed data storage and processing platform that scales to thousands of nodes. It was developed by LexisNexis Risk Solutions, which also offers paid enterprise versions of the software.
Sheepdog is a distributed object storage system for volume and container services and manages the disks and nodes intelligently. Sheepdog features ease of use, simplicity of code and can scale out to thousands of nodes. The block level volume abstraction can be attached to QEMU virtual machines and Linux SCSI Target and supports advanced volume management features such as snapshot, cloning, and thin provisioning. The object level container abstraction is designed to be Openstack Swift and Amazon S3 API compatible and can be used to store and retrieve any amount of data with a simple web services interface. It’s compatible with OpenStack Swift and Amazon S3.
Open Source Document Management Systems (DMS) Solutions
bitfarm-Archiv Document Management
bitfarm-Archiv document management is an intuitive, award-winning software with fast user acceptance. The extensive and practical functionality as well as the excellent adaptability makes the open source DMS to one of the most powerful document management, archiving, and ECM solution for institutions and in all sectors at low cost. A paid enterprise version and paid services is available.
Highly rated DSpace describes itself as “the software of choice for academic, non-profit, and commercial organizations building open digital repositories.” It offers a Web-based interface and very easy installation.
Epiware offers customizable, Web-based document capture, management, storage, and sharing. Paid support is also available.
LogicalDOC is a Web-based, open source document management software that is very simple to use and suitable for organizations of any size and type. It uses the best-of-breed Java technologies such as Spring, Hibernate and AJAX and can run on any system, from Windows to Linux or MAC OS X. The features included in the community edition — including workflow light, version control and the full-text search engine – help manage the document lifecycle, encourage cooperation, allow to quickly find the document you need without wasting time. The application is implemented as a plugin system that allows you to easily add new features through the ability to engage the various extension points predisposed. Moreover, the presence of Web services ensures that LogicalDOC can be easily integrated with other systems.
OpenKM integrates all essential documents management, collaboration and an advanced search functionality into one easy to use solution. The system also includes administration tools to define the roles of various users, access control, user quota, level of document security, detailed logs of activity and automations setup. OpenKM builds a highly valuable repository of corporate information assets to facilitate knowledge creation and improve business decision making, boosting workgroups and enterprise productivity through shared practices, greater, better customer relations, faster sales cycles, improved product time-to-market, and better-informed decision making.
Open Source Encryption Solutions
Downloaded nearly 3 million times, AxCrypt is one of the leading open source file encryption software for Windows. It works with the Windows file manager and with cloud-based storage services like Dropbox, Live Mesh, SkyDrive and Box.net. It offers Personal Privacy and Security with AES-256 File Encryption and Compression for Windows. Double-click to automatically decrypt and open documents.
Extremely lightweight, the 44KB Crypt promises very fast encryption and decryption. You don’t need to install it, and it can run from a thumb drive. This tool is command line only, expected for such a lightweight application.
GNU Privacy Guard. GNU Privacy Guard (GnuPG or GPG) is a free software replacement for Symantec’s PGP cryptographic software suite. GnuPG is compliant with RFC 4880, which is the IETF standards track specification of OpenPGP. Gnu’s implementation of the OpenPGP standard allows users to encrypt and sign data and communication. It’s a very mature project that hass been under active development for well over a decade.
gpg4win (GNU privacy guard for Windows)
See above. This is a port of the Linux version of GPG. It’s easy to install and includes plug-ins for Outlook and Windows Explorer.
See above. This project ports GPG to the Mac.
TrueCrypt is a discontinued source-available freeware utility used for on-the-fly encryption (OTFE). It can create a virtual encrypted disk within a file, or encrypt a partition or the whole storage device (pre-boot authentication). Extremely popular, this utility has been downloaded millions of times. It can encrypt both single files or entire drives or partitions.