Published: 10 Sep 2006
New software versions are often plagued by bugs. Symantec has fixed numerous bugs in Veritas NetBackup 6.0 as some users have had major problems upgrading to the newest version of the popular backup application.
When Symantec Corp. released the latest upgrade to Veritas NetBackup 6.0--Maintenance Pack 3 (MP3)--on June 30, Graeme Hackland took a deep breath. As IT manager for the Renault Formula One (F1) Team in Enstone, U.K., he uses Veritas NetBackup 6.0 to protect 80TB of storage that supports the company's F1 racing cars on the track and during the manufacturing process.
"It's working well for me now with Maintenance Pack 2 [MP2], so unless there's a compelling reason for me to go with MP3, I'm not going to rush into it," says Hackland. He has no plans to install MP3, despite the recent e-mail alert from Symantec urging customers to upgrade right away. His cautious approach is born of experience.
By the time the Renault F1 Team upgraded from Veritas NetBackup 5.1 to 6.0 MP2 in May, the product had been shipping for seven months. "We tend not to jump on new revisions, and I was a little nervous about NetBackup 6 because there had been quite a lot of bad press," says Hackland. "I wanted to be sure it was the right thing to do."
Hackland is referring specifically to issues with the new online (hot) catalog backup feature in Veritas NetBackup 6.0, which many early adopters reported was shaky at best and caused data corruption issues in the worst cases. The catalog is the heart of the backup server; it tracks all the backup jobs, file changes, all of the tapes and who has access to what information.
In previous versions of Veritas NetBackup, client backups couldn't run when backing up the catalog, which meant users had to schedule a time during the day to specifically back up their catalog. With Veritas NetBackup 6.0's hot catalog backup feature, a catalog can be backed up at any time, even when multiple client backups are running. This makes scheduling easier and it's much less tedious to back up the catalog more frequently.
To enable this new feature, Symantec rearchitected Veritas NetBackup, moving from a flat-file catalog to a Sybase database catalog to improve scalability. In the old flat-file environment, users had to read through the files to find the information they needed; the Sybase database architecture lets them perform queries against the catalog. "It allows them to scale up to much larger environments," says Matt Kixmoeller, Symantec's senior director of product management, Veritas NetBackup.
Acxiom Corp., an IT services company in Little Rock, AR, upgraded from Version 5.1 to Version 6.0 Maintenance Pack 1 (MP1) on March 21. There were additional scripts to convert everything from flat files to the database format, but that was the least of their worries, says Mark Lutgen, team leader, Unix engineering at Acxiom. It took the company 20 hours to back up a 50GB catalog, a process that would normally take 90 minutes. A bug in the integration between the vaulting feature in Veritas NetBackup--which automates the process of shipping tapes offsite to a disaster recovery (DR) facility--and the hot catalog backup feature caused the vault to keep running, leaving Acxiom no time to run its backups. It also meant the company had to hand-pick the tapes to be sent offsite.
Acxiom had "major issues with MP1," says Lutgen. "Symantec allows you to run powerful commands against their catalog, but you better know what you're doing or it can cause corruptions."
Acxiom also ran into configuration issues during a DR test at its recovery center that delayed client restore by several hours. These problems were related to another new feature in Veritas NetBackup 6.0, the Enterprise Media Manager (EMM), which consolidates Veritas NetBackup media and device databases onto one server to simplify recovery in a DR scenario. One issue had to do with some aliases Acxiom had configured at its home site that weren't configured at the recovery site. The other issue was a switch set in the EMM database that indicated Veritas NetBackup was in an upgrade process and prevented further configuration (adding devices such as tape robots or tape drives). To their credit, says Lutgen, Symantec technicians resolved the problems during Acxiom's DR test. "The problems were 6.0 related ... the fix involved us issuing commands against the new database," he adds.
Fortunately, MP2 was released a week after Acxiom installed MP1, which fixed the catalog backup and vault bug the company discovered. "We are operational and in no rush to upgrade to MP3," notes Lutgen.
Version 6.0 changes the catalog recovery process of Veritas NetBackup so significantly that many customers are still performing an offline (cold) catalog backup in case the online catalog recovery doesn't work or, worse, causes data corruption issues.
After hearing similar stories, Hackland decided to perform his upgrade to MP2 in two steps. "We felt that was safer," he says. He upgraded the master server (which includes the catalog) in the first part of the week. "That's the one thing that worried me because I'd heard about issues with the catalog," he says. However, he notes, the firm's catalog isn't particularly big (approximately 100MB). "If there were going to be major problems, I thought it was going to be at that stage."
Not so, but there were other surprises. During the upgrade process, users must disable all their policies, perform the upgrade and then re-enable the policies. The Renault F1 Team discovered a problem when it came to reactivating its policies. A script that should have reactivated all the policies didn't work, so an admin had to reactivate all the policies manually. It ended up taking almost a day to upgrade the master server, considerably longer than the company had expected.
Engineers then performed three test backups and three test restores, which all worked fine. "Off they went home, happy as Larry," says Hackland. But when they returned to work the following day, they discovered that none of the scheduled backups had run. "I am still not entirely clear what the issue was," says Hackland. Symantec told him it was due to a "storage unit problem," which is a part of Veritas NetBackup. He says Symantec engineers corrected the issue that day by stopping and restarting NetBackup, which they had done a number of times during the installation.
"That was a little nerve-wracking, coming in the next morning and getting the report that none of the backups had run," he says. "The rest of the upgrade that week was fine ... It took an age to import the media database, but there were no other issues on that day."
NetBackup Operations Manager
The final part of Renault F1's upgrade was to replace Advanced Reporter and Global Data Manager with the new reporting tool in Veritas NetBackup 6.0 called NetBackup Operations Manager (NOM).
"NOM is much easier to use; it's easier to see if backups have failed," says Hackland. However, there's no migration path between Advanced Reporter and NOM, so all of the historical data collected by Advanced Reporter is lost once NOM is installed. "I wish there had been some way of archiving the old Advanced Reporter stuff ... It's not a major deal because we have the information anyway, but it seems a strange thing that you couldn't go back and look at the historical logs," he says.
According to Symantec, Advanced Reporter wasn't serving the needs of its users and had to be overhauled. "It was a design question. Did we want features x, y and z, or did we want a very tight link in with the past?" says Mike Adams, Symantec's senior group manager, product marketing, Veritas NetBackup. "We thought it best to add feature sets to NOM [rather] than to create a vast link from Advanced Reporter to Operations Manager."
Symantec's Kixmoeller adds that historical data is really only used to see how well the current backup environment is performing. "When you move to [NetBackup] 6, you have changed your environment and you'll want to look at a new baseline of data," he says.
Flavio Hürlimann, IT storage specialist at Sunrise/ TDC Switzerland AG, a major telecom provider in Zurich, says NOM is an improvement over the previous reporting tools in Veritas NetBackup, but he feels it still doesn't meet requirements. Because it's Java-based, it takes a while to build screens, "so you're not sure if you have the most current information," he says. Sunrise/TDC Switzerland uses a reporting tool from Agite Software AG called backupVisual, which Hürlimann says creates more detailed reports and presents data more graphically than NOM.
Symantec is aware of the need for more detailed reporting on Veritas NetBackup and is adding more functionality to NOM. Furthermore, Symantec plans to improve the logging feature that records all of the events happening in Veritas NetBackup. Version 6.0 adds more detail to the logging records to aid troubleshooting, according to Symantec. But beware: The logs take up much more disk space because they're gathering more information.
"There have been some isolated issues of customers running out of disk space and that's causing particular issues," says Symantec's Kixmoeller.
In these instances, Symantec offers two options: Create a larger disk pool for the logs if you need that level of detail or wait for Maintenance Pack 4 (MP4), which will let you turn down the verbosity of the logs to create more space. "It comes down to whether the user wants more information or to be more conservative with disk space," says Kixmoeller. MP4 is expected in the fall.
Some users have seen their EMM databases corrupted when logging is turned on and the disk fills up. The MP3 alert from Symantec advises users to run logging on a dedicated file system to prevent this corruption. Users report that reading the logs takes days now rather than hours because they're so long.
|Top five best practices for upgrading to Veritas NetBackup 6.0|
Help arrives onsite
One user, who requested anonymity because his company has a policy of not endorsing or critiquing products from vendors with which it has relationships, spent days going through logs to figure out where his Veritas NetBackup implementation was failing. "Nothing gets done [to fix the problem] until you send the logs. In the end, Veritas couldn't replicate our problems in their environment, so they sent people to our place," he says.
The company has 170 master servers--60 running Veritas NetBackup 6.0 MP2--and almost 2 petabytes (PB) of storage under management. After running into major catalog consistency and corruption problems during the upgrade process, the company opted to spend $20,000 for a Veritas consultant who found all the discrepancies on its master servers and was then able to proceed with the upgrade. "That was a special deal for us as we are a large customer," notes the user.
According to Symantec, a free tool called NetBackup Catalog Consistency (NBCC) can be used prior to an upgrade to Veritas NetBackup 6.0 to check the catalog consistency. That tool wasn't available when this user upgraded in mid-March (see "Top five best practices for upgrading to Veritas NetBackup 6.0," at right).
The user plans to install MP3, and hopes it will fix a shopping list of other problems, including an issue with multistreaming, "which was the premier feature of NetBackup and now we can't even use it anymore." Symantec's response: When using Veritas NetBackup 6.0, 6.0 MP1 and 6.0 MP2, if a multiplexed backup receives an end of media message--and before it can get new media--another backup sent to the same drive will fail and there's the potential for data loss. The company says MP3 fixes this problem.
The user also went through considerable pain when installing Veritas CommandCentral agents on Veritas NetBackup servers, which caused NetBackup to stop recognizing the company's tape drives. They had to completely uninstall (rather than disable) the agents to fix the problem, which caused backups to fail for a day.
"The biggest concern that we want to see fixed is the error-code 200 problems that we're having," says the user. "If a parent job fails or hangs, nothing else runs. Also, if a manual backup is kicked off, the policy will not resume running the next day for some reason."
|Experts advise users to delay upgrading|
Symantec Corp. isn't the first company to rearchitect its product from a flat-file to a database architecture. Microsoft Corp. attempted a similar feat with WinFS, in which it was integrating unstructured data into a relational database. WinFS reached beta after several years of development, but Microsoft canned the project in June 2006. In a blog posted on the Microsoft Web site, the company announced it would no longer be "pursuing a separate delivery of WinFS," instead choosing to integrate it into the next release of SQL Server.
"Converting a basic file system into a more organized structure like a database is no mean feat," says Ash Ashutosh, CTO of Hewlett-Packard Co.'s storage management software group and founder of AppIQ. "Everyone understands files, but databases are a whole different beast," he says. "The tools are a lot geekier."
In Symantec's case, the company picked the Sybase database because other Symantec products use it, but also because Sybase made its product look like a file system to the app, according to Ashutosh. He says this would have meant fewer changes to the Veritas NetBackup code, but significant testing between NetBackup and the Sybase database. Ashutosh says Symantec should have kept Veritas NetBackup Version 6.0 in "QA and test for a good 24 months." Symantec shipped NetBackup 5.1 on June 7, 2004, and NetBackup 6.0 on October 3, 2005. (HP's OpenView Storage Data Protector product competes with Symantec's Veritas NetBackup.)
According to industry analysts, all software companies, especially the larger players, are under considerable internal pressure to meet release dates that have more to do with meeting quarterly earnings than releasing a solid product. There's also pressure from customers demanding new features. The upshot is buggier software. Experts advise not to upgrade to a new release until absolutely necessary.
Growing bug list
The list of bugs goes on and on. So far, Symantec has fixed 100 bugs in MP3, which a spokesperson says "is very typical for a quarterly maintenance pack and a product that has as many capabilities and supported components as NBU [NetBackup]." It's no surprise many users are holding off upgrading to Version 6.0 on the advice of Veritas NetBackup resellers, who are telling people to wait until Version 6.5 (expected in the first quarter of next year) for the product to become stable (see "Experts advise users to delay upgrading," at right).
Symantec declined to comment on how many bugs it has discovered in Veritas NetBackup Version 6.0. "We don't get an accurate report on the number of bugs," says the firm's Adams. "When any new release comes out, there will be any number of bugs ... We never got any highlight that there were any more [bugs] with [Version] 6 than with previous versions of NetBackup."
"There are substantial leaps forward [in Veritas NetBackup 6.0]," adds Symantec's Kixmoeller, "but it's important to realize they're substantial changes, and it takes some investment on the customer's part to learn about the new tools and properly plan for the upgrade."
A former Veritas engineer who quit Symantec in March because of the Veritas NetBackup problems, says Version 6.0 was probably released too early and was still essentially a beta product.
"There were several Sybase bugs, with config files getting spontaneously corrupted; the worst part was that the techs weren't adequately trained on Sybase to understand the problems," she says, preferring to remain anonymous because she still works in the industry.