With inline deduplication, one of the pros is that when it's done, it's done. So when your data is written into the device, dedupe is accomplished and you keep on trucking. You may have less upfront performance compared to some of the post-process vendors because you're doing a lot more work as you're doing inline deduplication. But then another pro is that you could have less complexity and, essentially, time to accomplish everything.
So, if you look at the backup process' production and then backend things like dedupe, copies and replication in batch, you may be able to get a lot of your batch work done in less time by having a single step. In the post-process models you have multiple steps; you write into a staging area, then you read out to a dedupe area.
One of the bigger advantages of inline is that you can allow replication of a device from a source to a target and start right away. So with post-process there are some other things to think about. You can have a really fast initial write speed; extremely high performance and writing your data backups inbound. This can be very good from the point of view of meeting your backup window and accomplishing extremely fast restores from this device back to clients for the most recent data.
A little caveat here is if your network and your clients can really keep up with some of the technologies that are out there today. Dedupe on the backend also needs to keep up with post-process. So if anyone has ever run a TSM environment, if you fall behind with your batch operations, say more than two days behind, it's really bad and you will struggle to keep up. That same paradigm really applies to deduplication. With post-process dedupe, it's great if you can write in a lot of data at fast speeds, but it's not great if your dedupe capability on the back-end can't keep up.
So, let's say you're writing in, but you can't write out to your dedupe environment before your next backup. What happens when your front-end cache builds up? That means you pretty much have to wait and try to catch up before your front-end is available or has enough capacity for the next inbound batch of backups.
So, realistically you're going to size for multiple days or some head room to grow into for your front-end cache so you don't run into that situation. But from a technology selection and design point of view, you've got push your vendors to make sure that they're going to design the back-end of post-processing to keep up with the front-end capabilities. Otherwise you're going to be buying a half-baked solution.
Another con for post-process is that you're going to have more physical resources because you have to a staging area and capacity on the back-end for deduped storage. But these are all tradeoffs, and honestly I've seen cases where each approach can be very good for large and small customers. So it varies on your requirements and what you to accomplish.
Click here to check out the entire VTL Deduplication FAQ.