Robert's Tech Lab

homelab

#docker #fediverse #bestpractices #homelab

I'd like to say I've stood up my fair share of projects, both personally and professionally over the years. From running open source tiny projects like WriteFreely here to standing up massive microservices in Kubernetes for large companies, I've done my fair share of Docker.

Over the last weekend I've been trying to stand up a Fediverse instance. Those of you who know me know that I'm fairly passionate about the fediverse, allowing us to democratize social media a bit by hosting our own version of Insta/FB/etc and allow our self hosted instances to talk to each other.

However what started as a very fun idea to stand up my own has instead turned into over a week of frustration and questioning, and it inspired this post today.

So today rather than talking about hosting, or Kubernetes, or Docker, today I want to write some essential dos and maybe more importantly do-not-dos for Docker images. None of these things are me saying “You did this wrong”, but more me trying to show the multitude of ways that people use Docker, and how to make sure people find your docker image easy to use, instead of finding it frustrating.

I won't point fingers or use any exact examples, but I will come up with examples of both the negative and positive cases of each.

Don't: Write shell scripts for building/running containers

To be clear I don't mean scripts inside the container, but install scripts to actually run your containers, scripts that actually run docker build or docker run.

I've seen this trend in a lot of open source projects, and it's a bit of a pet peeve because I know maintainers are trying to make images easy to use for their users. After all, we all know plenty of people will demand “Just give me the exe”. (If you don't know, I warn you there's plenty of profanity, but here's the link). However, there's a line that can be crossed, where making it so easy that containers will “just start” hides a lot of the crucial info that is required for actually running a container.

I've spent hours reading through install.sh scripts trying to figure out how maintainers actually build their docker containers, when really all I want to do is figure out what steps I need to build the container, or even how to simply run the container.

The reason this is difficult is simply this – what if I'm not running this container on this machine? What if your user is using Windows? Mac? FreeBSD? What if the machine I'm going to run the container on is in the cloud, across the room, or in my case, I run Kubernetes. I will not be using docker run or docker compose, so while you've made it marginally easier for people using those tools, many other people are left with an even harder to stand up version of your software.

Docker was built to be able to run your code on any machine already, with a standard set of rules and guidelines. Adding custom scripts into the mix means that the process has deviated from the standard Docker guidelines, and new users will need to learn your specific way of standing up your containers.

Do: Write clear documentation on building containers

It may seem backwards, but providing clear and concise documentation in this case of your environment variables, volumes, ports, and other docker-related items is much more helpful than a script.

You can run any container by knowing the environment variables, volumes, and ports – and providing that documentation really is all you should need to do to get your users up and running.

As for the “Give me the exe” folks, well, if you want to self host, learning docker is one of the best ways to get started.

Environment Variables

Below are the environment variables to run , their description, and what their expected values would be.

Name Example Description
DB_HOST 127.0.0.1 The host (or IP address) to locate the database on.
DB_USERNAME admin The who will log into the database (located at DB_HOST)
DB_PASSWORD (Your DB password) The password for DB_USERNAME

It's tempting to build a nice CLI that allows your users to type that in and it sets it up for you – but by doing that you're only helping a fraction of your users who may run this on their one computer. Help all of your users by instead providing clear documentation.

Don't: Use .env files in Docker

Following our environment variables, another worrying trend I've seen with small projects, the use of .env files in production.

For those who don't know, .env is a growing standard for setting up environment variables for development purposes. I use them myself in python and JS projects. It allows you to set your environment up in a simple file rather than trying to figure out how to set up your system's environment variables while debugging your code.

DB_HOST=127.0.0.1
DB_USER=admin
DB_PASSWORD=foobar!!

The format of a .env file

This is great for development on your machine. This is a maintainability and security nightmare in production. Let's break it down for a few reasons why you should never use .env files in production.

An additional file is an additional dependency

Adding a .env file is another file that needs to be maintained and backed up in the case of a server failing. Rebuilding the environment of a container should not be something that can be lost in case items are not backed up.

Secrets/Credentials

Security here is the most damning for .env files. I gave the database example above to highlight the point, in the case of .env files, your secrets are stored in plain text somewhere on a computer. Docker, kubernetes, openstack, everyone provides some sort of secret store where you can safely place your secrets and inject them into your running container. Personally, I really enjoy using 1password's secret management tools that handle this for me.

Forcing the use of a .env file in production completely negates all of this security work by forcing passwords to be stored in plaintext.

Conflicts with Docker principals

This one I have to throw in, containers aim to be as stateless as possible, by introducing state where there doesn't need to be any (especially when environment variables already exist), goes against what containerization is attempting to do.

All major languages/frameworks allow some way of setting configuration values from either a local file (.env, appsettings.json, etc) or from environment variables and merging them together. I urge developers to learn these settings, it takes a few minutes to learn and will save many headaches later.

Do: Use environment variables

Simple as that. Simply use environment variables to configure your application. Most .env libraries were built to overwrite the environment but will use the environment variables in the background. Let those libraries do their job. If it finds a .env file, they will use it for debugging. If they don't find one, they will use the environment set by the system.

Don't: Share/re-use volumes

I've only seen this on a few projects, but it's a critical one. Never share volumes between containers. It's just more hassle than it's worth, and usually is highlighting an underlying problem with your project.

I see this mostly in the case of something like an app server that serves HTTP, with workers running in other containers. That there is a great paradigm, it allows me to make a separate worker while keeping my HTTP server up. Say though, that you need to access something from the HTTP server. Well, the first approach might be to just share the same volume. HTTP server uploads it to /volume/foo.txt, and then the worker can also mount the same volume and read it, right?

Well, yes – if these containers are always running on the same machine. But – will they? The example I laid out actually kind of contradicts that thought, the point was that we would not want to overwhelm our API server, so wouldn't a natural thought be to move our worker to another machine? If we require a shared volume, that immediately becomes more complex.

Kubernetes offers a few types of volumes, RWO (Read-write Once) allows you to share a volume across running containers (pods), but the limitation is that many can read, only one container can write. Maybe the worker needs to write back to the API container? Well, they offer RWX (Read-write Many) which does work – however it usually requires quite a bit of setup, and how do they accomplish that? Most RWX providers are running an NFS server inside of the cluster, which means that local storage is no longer local, and writes are happening over a network. So to achieve this simple cross-container volume, we must set up quite a bit of overhead to achieve a solution that does not work as well as we wanted it to in the first place.

Do: Use a cache or database

A much cleaner approach, and one that was meant to handle this from the start, is to use a cache. Redis is extremely easy to set up and will solve 99% of the issues you may have here. In the case of text, json, or anything else you may need to pass through, sending it through Redis will probably save you.

For small files, like an avatar or thumbnail, Redis does allow binary files, if you need security most SQL and NoSQL databases allow binary data or attachments. I know many will scoff at the idea, but if the goal is simply to facilitate communication between different containers, this can be a fine solution, say to pass a file back to a worker, who will process it and eventually load it to S3, as an example.

Don't: Write files around the filesystem

A large source of code-smell is if files are written all around the file system. If your code needs some files over under /var and others under /data, maybe it needs to write some files under another location – it becomes very difficult to track all of them. Confusion though is just one problem.

Kubernetes and containerd actually don't let you modify files that aren't in a volume. The entire filesystem of a container running under containerd (k3s for example uses containerd as a default, which is good because it's more lightweight) is completely immutable. Attempting to write to a file that is not explicitly mounted as a writable volume will cause your application to throw “Read-Only filesystem” errors.

There are exceptions to this. /var/log for example is mapped to a temporary cache automatically so logs can be written there, as is standard. /tmp is the same, as anything written to tmp is assumped to be temporary and doesn't matter.

If you need to write to a random file though, say /var/foo/bar, containerd will throw an error, and that directory will need to be explicitly mapped by your users before they can write to it.

Do: Choose one workspace to work out of

There is nothing wrong with needing persistent storage, but try to keep it to the minimum number of volumes. Say you want everything to be under /data, then that allows users to create one and only one volume that will be persisted.

However conversely:

Don't: Store temporary data in persistent storage

The exact opposite of the above, if you have temporary data, store it in temporary storage. Most persistent storage options take on the approach that storage can always grow, but cannot shrink. Meaning if you store logs under /data/logs and /data is a persistent volume but logs can grow to say gigabytes in size, your users will end up paying gigabytes for storing those logs.

Do: Use standard locations for temporary storage

Put logs under /var/logs and temporary storage under /tmp and everything will work perfectly.

Don't: Log to a file

Or don't store logs at all! On the subject of logs, in a containerized world – don't worry about them.

If I have a container acting up, if I want to see the logs I vastly prefer using the docker logs or kubectl logs command over needing to exec into a container to tail a logfile.

Do: Print logs to the console

Kubernetes, Docker, all of the orchestration frameworks then will be able to consume your logs and rotate them accordingly. All logging frameworks will dump logs to the console, there is no need to write your containerized logs to a file at all.

My final one for today, the worst offender I have, this one is important folks.

NEVER: put source code in volumes.

This one is so bad that I'll call out a direct example – Nextcloud. Nextcloud is a PHP application, and their AIO container is one of the most frustrating I've worked with – mostly because their source code is actually stored in the persistent volume. There's many, many reasons for why this is bad practice and goes directly against the very ideas of containerization, but I'll call out a couple.

Containers should never need to be “updated”

Containers should be all encompassing for their environment. There should never be a case when a container starts that some packages may be out of date, or some code may be the wrong version – that's the point of containerization. When a container starts, that is everything it needs to be able to run.

No version control

Since persistent volumes are outside of a git repo, there is no way to update the versions without writing your own update functionality. There are simply too many ways that code in the volume can become inconsistent with what is in the container.

Do: Make use of temporary data and configuration

Code should remain under it's own directory, with persistent data under another completely separate directory. If you're building your own container, I recommend /app for where you put your workspace, and then mounting /data for any persistent storage needs. To go above and beyond, make an environment variable for MOUNT_POINT that maps to /data by default, but let's your users mount anywhere on the file system, that gives them the most flexibility. Then you don't need to worry at all about where in the system your mount points are.

If there are dynamic packages or plugins, there should be a way to define these in environment variables and they should be ran ephemerally. On container start download the plugin to /tmp and use code from there. On container restart, the same thing will happen. This also ensures plugins stay up to date, as plugins will always download the latest version.

#homelab #selfhosting #unraid #storagespaces

I debated on what I wanted my first main topic to be about today, and after some internal debating I ended up deciding on starting like I do with most of my projects, from the bottom up. So today I'm going to talk about how I manage my (apparently) large amounts of data.

Storing data has always been the root cause of all of my selfhosting and homelabbing. Starting back in the days where an mp3 would take an hour to download, I quickly learned that I did not want to download things multiple times, if you can even find it a second time. So, I learned to download frequently and often.

As we started using data more, we had piles of new things cropping up. Photos were going digital, even some short video files started to appear. I remember being very annoyed that we ran out of disk space, and were forced to choose which photos to keep and which ones to delete, and so I started looking into my first few ways to store larger amounts of data.

Today I'm well over 100TB of data at home, mostly my own personal media, personal projects, backups of backups, and I proudly host cloud drives for my family. Getting here though was a long process.

Getting started

As I said in the first post, I started with a rock solid Pentium 3 with 256MB of RAM way back in the day. At that time sharing data between computers was as simple for me as setting up a share on Windows and sharing files stored on the main drive of the “server”. I ran Windows Server 2000, and ran shares for the entire house from that PC. (I actually remember installing games to the shares, and running installed games like Age of Empires from other PCs via Windows Shares).

Unfortunately data scales, and so the more I stored the larger drives I needed. A massive 120 GB drive becomes a 320, and then 500. Each time I would carefully install the new drive by hooking up the giant ribbon cable, copying everything through Windows Explorer to the new drive, and praying that everything would complete well.

Eventually I outgrew what a single drive could handle, and I got my first external drive, an IOGear 1TB “Drive”, which was 2 500GB drives in a RAID 0 configuration. I'll explain RAID more below, but if you aren't aware this is essentially 2 drives working together to appear as if it's a single 1TB drive. At the time I wasn't aware what RAID was, or that I was actually using it, or how risky it was at all (especially since it was in an external drive that I took with me everywhere) – but I would learn over time.

This system of a server computer hosting a few drives worked well for years. Into college I continued to grow my data to the point that I finally bought a 3TB internal Seagate drive, which I was amazed at how much storage it could fit. Unfortunately, this is when I learned my first lesson on the importance of data redundancy, as I woke up one morning and heard the stomach churning noise: click click click whirrrr – click click click whirrrr.

That was the end of all of that data, about 7 years of memories and content built up over years gone, overnight. I was devestated, and the thought of doing professional recovery for well over two thousand dollars made a college student living on ramen queasy. So, I had to start all over.

Trial and Error, “pro-sumer” level storage.

I went through a few different solutions for a few years before I solidified my decision on what I would use. I'll go over a few of them now, and why I eventually ended up choosing Unraid. Foreshadowing...

Hardware RAIDs

The first option was probably the easiest solution to start up with. I needed space, and I needed ways to store my data in a way where I could minimize losing data if a disk failed. This would mean combining multiple drives to act as one large drive, which is called an array.

I mentioned RAID 0 above. RAID is a hardware level option for combining drives in different configurations so that the above operating system only sees the drives that it sets up. If you want to combine 3 drives into one super mega drive, that's RAID. You usually configure it in your motherboard's BIOS, and then maybe install a driver in Windows to show the new drive. To Windows, it looks like any other drive.

An example of a RAID configuration An example configuration of a RAID configuration in BIOS

RAID has a few configurations, I'll talk about the core ones, but there's a complete list on the wikipedia page.

  • RAID 0 – All data is striped. This means that you are maximizing your data. For each N drives, your data is split into N chunks and distributed across all of those drives. Like slicing a loaf of bread, the first slice goes to drive 1, the second slice to drive 2, and on and on until you're out of drives and it starts over at 1.

    • 0 maximizes storage, but your tolerance for failure is extremely high. If any drive fails, you lose the entire array. There is no redundancy, it is simply lost.
    • 0, however, is great for speed. Since you are pulling from N number of drives, you have (slowest drive speed x N) the speed, with the only maximum speed being the speed of what your RAID controller (motherboard) can handle. RAID 0 is a popular choice in gaming and professional server applications. (It's actually what I still use in my main Gaming PC, I have 5 2TB SSDs in a RAID 0, and games are fast)
  • RAID 1 – All data is mirrored. Storage size is not the priority with RAID 1, but rather redundancy. In our bread example, for each slice of bread it sees, it instead clones the slice of bread, and puts a clone of the slice on each drive. So for N drives there are N number of loaves of bread now. (Okay the analogy is falling apart, I guess they have Star Trek replicator technology)

    • 1 has maximum redundancy, if a drive fails you can simply replace the failed drive, and the failed data will be copied to the new drive, restoring the full array.
    • 1 has no speed implications, as it is still limited by an individual drive's speed
    • 1 is the simplest approach to having redundancy, you carry a complete mirror of the drive twice, which means that for building out your system, you must double all costs to carry the second copy.
    • We are still limited by the size of a single drive, so no additional storage space is gained.
  • RAID 0+1 – Here we're starting to get a bit more clever, but not completely. Data is both striped and mirrored. This is where you may need more space of a RAID 0, but you want the redundancy of a RAID 1. Data is first sliced across the multiple drives, and then cloned into the mirror. You get the benefits of 0, with the additional storage and speed, the safety of 1 because you have the entire mirror, but unfortunately you also get the downsides of both as well.

    • 0+1 gives us the speed boost of 0, and the mirroring of 1 together
    • 0+1 mitigates some of the failures of 0, where if one drive fails you can recover it. However, if one drive fails you are officially down to one drive, and your entire array depends on the one single drive to read everything from, probably it's most intense use to date, without failing, to replace the failed drive. If that drive does fail, the entire array is lost.
  • RAID 5 – Finally we arrive at something that might work for a real world use case. 5 introduces the concept of striping with parity. Parity is going to be a big word as we continue. The concept is that if you take a RAID 0 array, for each bit on the drive you can do a mathematical equation on it, and then at the end, record the result of that equation. If one drive fails, you simply reverse that equation to find out what the value of the failed drive's bit was and store it on the new drive. I'll explain this a bit more, but essentially your number of drives needed for a 1-drive failure scenario is no longer 2N (where N is the number of drives in your array) for 0+1, but now it's N+1. For N number of drives, with parity you only need one additional drive. (The exact implementation is a bit different in that it slivers the parity across multiple drives, but for now this explanation will work) So to break it down:

    • 5 gives us RAID-0 read speeds, with varying write speeds (due to the calculation of the parity).
    • 5 gives us full array redundancy, with only needing 1 extra drive
    • 5 any one drive can fail, and the entire array can be rebuilt from the parity – however...
    • If any one drive fails, the entire array must go through a rebuild cycle to regenerate the failed drive, to bring the system back up to parity.

Again I'll dive into how that parity is calculated below, but that's the gist of it.

Okay, that was a lot, thanks for learning (a few of the standard) RAID types!

Hardware RAIDs are common, and I used one as my primary storage for a while, however they have a couple major flaws.

RAIDs are hardware, meaning that they work using your motherboard's RAID controller, or some other controller that you may install. This means that if that controller fails, then you are at the mercy of another controller working in the same way, or finding another identical controller. Portability to a new computer is near non-existent because of this.

RAIDs require that all of the drives be the exact same model. Not just the same size, the same model. Remember the hardware is in control of slicing that data up, and that means that the interfaces and ways it stores that data must be exactly the same. This is a pretty severe limitation of RAID, that you pretty much must know exactly how you want to build your array before running it.

But what if they don't make that drive anymore? What if they changed the drive and didn't tell anyone? What if you simply want to add more storage to your array? Well, then it's time to take it to the next level. To software RAIDs.

Windows Storage Spaces

Storage Spaces was my first foray into the world of Software RAIDs. Software RAIDs are similar to Hardware RAIDs in that they still combine disks, usually with the same basic algorithms, but since they're software based they have additional flexibility, mostly that being in software, you don't need to use the same model or even same capacity of drive. You can add a 4TB drive to an array of 3TB drives and it will work fine. This was a huge determination for me, because I wanted my array to grow with me.

Windows Storage Spaces is a built-in Microsoft approach at handling multiple disks and spreading data across them. I first heard about it through a friend at work, who recommended Storage Spaces. I decided to try it out, just for fun, and created a Virtual Machine with Windows Server 2012 on it. I attached 4 virtual “disks” to the virtual machine, so I could play with the array. The drives weren't big, only 100MB each, but I was able to create a simple array through Window's dialogs. The size was fair, it wasn't the full 400MB, but it was clearly keeping a parity copy, so it was about 320MB of total space.

I copied some data into the newly formed Storage Spaces drive, and then I proceeded to mess around. I shut down the VM, detached a drive, and watched what would happen. Storage Spaces saw the failed drive, and offered to remove it, or I could even still start it in a “degraded” state. I detached another drive and the array went offline. I could hot swap drives, I yanked drives while the VM's “power” was still on, everything was stable. I added drives of different sizes. When I was happy with my testing, I installed Server 2012 on a spare computer with a bunch of SATA ports, set up my storage spaces, and started the copy.

Screenshot of the GUI configuring Storage Spaces Configuration Storage Spaces, from the blog I'm pretty sure I read way back in the day to get started

Which took forever. I assumed it was just that my network was slow or something but I was getting maybe 100Kbps. I learned the biggest downfall of the software raid, that it's software. Being in software means that parity calculation must be also done in software, there is no specialized hardware calculating the parity for you. Read speeds were fine off the drives, but writing to the drives became an arduous task. I stuck with Storage Spaces for a while, but it was clear that as long as I used Storage Spaces, I would just have to deal with the trade off of flexibility for write speeds.

Until...

Unraid

I started hearing about Unraid on Reddit's /r/DataHoarder for a while. DataHoarder I realized was a totally real term that did apply to me, the need to retain and collect all data while the Delete key on the keyboard grows dusty and sad. Essentially we're squirrels, but for data.

Unraid is an entire Operating System, meaning that it won't be something you can enable on your existing computer, you will need a separate computer from your primary to run it.

It's primary feature is of course, the data array. Unraid offers a software raid that's similar to Storage Spaces in that it can use arbitrarily sized disks to create an array of any disks you have lying around. It creates a “storage pool” using these disks, mimicking a RAID 0 environment, but with an important caveat. It doesn't stripe the data. Instead of striping your data like slicing a loaf of bread, it will choose one of your disks to put the entire “loaf” (or file) on. So when you look at an individual disk, you will see your whole file sitting there. The pool/array part is that your directories (folders for you windows folks) will be split, so one file may be on one disk, but another file will be on a completely separate disk. The operating system then uses some proprietary magic to create “shares”, that combine all of these individual disks to then look like one large cohesive drive.

Image of unraid's main screen, showing multiple drives

So that's storage, what about parity? Storage Spaces reserves a small portion of each drive that's attached to store parity bits from all other drives. Well, let's talk about Parity.

Unraid's Parity system

Parity for unraid is the same basic function, but instead of reserving space on each drive, in Unraid you specify a separate drive, similar to RAID 5 mentioned above, to be your parity drive. The caveat is that Parity must be the largest drive in the array, or equal to the largest drive in the array. This makes sense why in a second.

To calculate parity, on each write Unraid does basically the same thing as RAID5 and Storage Spaces, in that it it runs a calculation on the entire array to determine what the parity bit should be. If you have 4 drives plus 1 parity then, it would mean that it sums up the first 4 bits, and calculates what the parity bit would be.

For you nerds following, this calculation is an XOR between each drive. For everyone else, think of it like adding up the 0's and 1's, and then deciding if it's even or odd. An even would be a zero, an odd would be a one. So if one drive failed, all you would need to do is add up those numbers again, and using the parity you could tell what the missing value would be. If you had a missing drive and the values were even, but the parity said it was an odd number, then you know that missing bit should be a one. Some examples:

Drive 1 Drive 2 Drive 3 Drive 4 Parity
0 0 0 0 0
1 0 0 0 1
1 1 0 0 0
1 1 1 1 0

So when writing a file it will check the other bits at the same location on each drive, and then update the parity drive with the new value. This is why the parity drive must be the largest drive, it must have the capacity to store the entire array's parity bits. For the largest actual storage drive in the pool, there must be a corresponding parity bit to match up in case that largest drive fails.

Double Parity

Unraid allows you to set up no parity (please don't do this), single parity like above, or double parity. This is actually supported across the other systems as well, RAID 6 is the same as RAID 5 but it has the extra parity drive. Storage Spaces also supports the double parity.

Why do double parity? Well, let's think about how parity works. If a drive fails, that means your array has no redundancy. At that moment, you have suffered the maximum amount of drive failures that it can handle. At the same time, you need to replace the failed disk, and at that time the rebuild process will start. The array rebuild consists of – Reading every bit in sequential order, start to finish, on every drive – Calculating the what bit should be on the new drive – Writing the new bit to that new drive

During a parity rebuild this operation will run all of your disks at 100%, for likely many hours (maybe even days, now that we're running at 20+TB drives). Your machine will have 100% activity while also producing the most heat it ever will. In essence, these are prime conditions for another drive to fail. If you have another that is teetering on the edge of failure, this would be the time for it to fail – and remember it's at this time you are at your most vulnerable, you have no extra redundancy.

This is why I recommend just biting the bullet and putting the extra drive into your cart. Yes, it's more money, but it'll save the extra stress and anxiety if the worst should happen.

Cache Drive

Unraid supports using a Cache Drive along with the main array. This is extraordinarily useful because of the limitations of both write speeds of spinning hard disk drives, but also the limitations of calculating parity. (Remember that calculating parity means spinning up all of your drives at once, and then running those calculations, then writing to the drive the file is stored on along with updating the parity). Using a cache drive like an SSD or NVME drive means that you can write that new file at blazing fast speeds, and unraid will save it to your full data array later. The tradeoff is that for this short time your file is unprotected by the array, but it allows you to move onto other things.

The process of moving files to the array is (clever enough), called the Mover in unraid. For my own, I schedule the mover at about midnight every night. Unraid takes any files on the cache drive and saves them to the array. In the case that my cache drive dies, I've lost maximum one day's files.

Full Homelab Suite

Unraid is much more than just a data storage system. It's a full operating system, with full virtualization built in. Because of this in Unraid you can run virtual machines and run docker images right from Unraid itself. This can be an amazing way to get started with homelabbing, by running your first applications right on unraid. There's no need to run separate servers or anything with unraid. You of course can, but it's extremely easy to get your first applications running with only Unraid. Need Windows? Spin up a Windows VM. Want to run Plex? Use the Plex docker image and directly hook in your media.

Comparisons

My ultimate decision to go for Unraid was a personal one, there are many other storage solutions out there that I could have gone with, more than I can document here, but for me ultimately this is how I viewed it.

Pros – The array can be expanded, and I can use newer drives to allow the system to grow with me. Where I started with 3TB drives, I just installed my first 22TB drive into the array, all without needing to do a massive copy of all of my data to a new array. – Write speeds are better than storage spaces because of how parity is calculated. – If you add a cache drive, write speeds are much faster than storage spaces. – Unraid is software, so you aren't dependent on a controller failing, like with a standard RAID. If your motherboard dies, you plug the drives into a new computer, and start Unraid. The configuration itself is stored on the array. – Unraid has an amazing feature suite of additional things you can do with it. It deserves it's own post, but if you're getting started you can: – Run VMs – Run docker images – Download a shocking amount of plugins

Cons – Not a true RAID array, so no performance gains – By keeping files separate rather than sliced, you are limited to the speed of your drive – A full OS, so this will truly only work as network attached storage, not on a primary PC – Proprietary. While relatively inexpensive (currently $119 for a forever license), it is not open source.
– Needs to run regular jobs to maintain parity – Parity check (on your schedule preference, I run monthly) – Mover (and while mover hasn't run, your data is at risk)

Summing up

Overall, I chose Unraid for it's extreme flexibility, even with it's slight performance hits. For me, this is my primary network storage, where I store large files that may only be accessed once in a while. If I'm storing things there, I can live with a write operation taking a bit longer than lightening speed. With the cache drive on top I can get the maximum my network allows, and then set it and forget it, with the mover picking up my writes again later.

There are many different options out there. One I looked into I didn't have the space to write about here was TrueNAS, I know a lot of people like that. I've toyed with ZFS a bit too on proxmox, ultimately it's going to be a use case. There are dozens of comparisons online if you're curious between options, but I hope you see the benefits of Unraid, and why I personally chose it.

It is a great system to start with if your curious about starting homelabbing/self hosting, as it does provide everything out of the box.

This ended up being much longer than I anticipated, but I wanted to give a full idea of how I manage storage. If you read this far, thank you! This was a lot to write so we'll see going forward how my other posts turn out. See you next time!