German English French Italian Spanish

Why deleting the DAOS catalog is a bad idea

Technote from IBM Support

Question

Is deleting daoscat.nsf and/or daos.cfg harmful?

Answer

Overview

The two DAOS control files are daoscat.nsf and daos.cfg. In general, it is a very bad idea to delete them. Doing so can lead to widespread attachment unavailability, unnecessary duplication of data, unnecessary traffic to daoscat.nsf, and difficulties in restoring .NLO files.

 

Background

A quick tour of some aspects of DAOS:

The daos.cfg file contains a list of the directory paths under the DAOS repository root, and a count of the number of .NLO files stored in each one. The count information is used to ensure that a limited number of .NLO files are stored in each directory for filesystem performance reasons.

The daoscat.nsf file contains three major items:


    1) The DAOS Object Index (DOI) which is a list of the key (.NLO filename), reference count, and location of every .NLO file in the repository.

    2) The DAOS ID Table (DIT) which is a list of all NSF files participating in DAOS, with the total number of NLO references for each .NSF file.

    3) The deletion list is a collection of the keys that have 0 references, with the datestamp of when the last reference went away.


When an attachment is stored in DAOS, a 'ticket' is written to the NSF as a placeholder. This ticket contains the key for the attachment, as well as a location hint. If the hint is incorrect, the DAOS catalog is consulted to look up the key to find the current location for the .NLO file.

 

If daos.cfg is deleted:

If there is no daos.cfg file when the API is started, DAOS will begin at the root of the DAOS repository and enumerate the files in each subdirectory. This is a relatively costly filesystem operation, and can take a significant amount of time, possibly several hours if there are a large number of .NLO files in the repository.

This startup delay often appears to be a hang, and a common reaction is to kill the process and start it again. If the counting operation is interrupted in this manner, it's possible for an incomplete daos.cfg file to be produced.

All location references in the DOI and in the DAOS tickets are a numeric index into the list of paths in this file. If the file is incomplete and no path exists at the requested index, or if the wrong path is listed at the index, the resulting path constructed for the .NLO file access will be invalid, and the access will fail.

If nothing else, rebuilding this file is usually just a waste of time and machine resources. The file counts here do not have to be exact, and unless the file is corrupted, a full data restore is being performed, or the file is otherwise known to contain significantly incorrect information, there is no benefit to re-counting all of the files and re-creating it.


 

If daoscat.nsf is deleted:

DAOS will create a new daoscat.nsf if one does not exist at API startup. This file will contain an empty DIT, DOI, and deletion list.

When a new attachment is received by DAOS, it calculates the key (checksum) of the contents, and then looks up the key in the DOI to see if the attachment already exists. If it does not already exist, a new entry is created, and the location of this new .NLO file is stored in the DOI and in the DAOS ticket in the NSF. If the DOI was empty due to daoscat being deleted, DAOS will not be able to locate a previously-existing copy of this new .NLO file, so the new one will be a duplicate. This duplicate .NLO is a waste of disk space.

When DAOS resync is run at some point later, the DOI will be populated, and the duplicate will be detected and deleted. The ticket in the NSF will not be updated however, so the location hint will be incorrect. When this attachment needs to be read, it will first look at the hint location in the ticket, which will fail. It will then use the key to look in the DOI for the correct location of the .NLO file, which will succeed. The retry logic allows access to the attachment, but it increases the traffic on daoscat.nsf.

As of 8.5.2, DAOS resync keeps the old DOI active while it is populating a new one. This allows DAOS to be able to correctly look up the location of NLO files. An empty DOI is of no use in this respect.

If it is necessary to restore an .NSF file, the following command will display a list of the .NLO files that do not currently reside in the DAOS repository, and need to be restored for all attachments to be available:

LISTNLO MISSING

The output of this command is based on the hint location in the DAOS tickets. If the hint location is incorrect because a duplicate .NLO file was deleted, it will be more difficult to restore the appropriate .NLO files since they won't have been backed up at the location displayed.

DAOS resync will also re-populate the deletion list. All entries added to the deletion list will be datestamped with the time of the resync. The DAOS prune operation will wait until the datestamps are older than the deferred deletion interval, so resetting the datestamps of these entries will delay the prune operations, causing unnecessary disk space usage.

Even if the DAOS catalog is not in Synchronized state, it contains valuable information. Again, unless the file is corrupted, a full data restore is being performed, or the file is otherwise known to contain significantly incorrect information, there is no benefit to re-creating it.


 

Recovering from deleted files;

If daos.cfg is deleted, it will be recreated at startup. It is crucial that the creation process be allowed to complete so that a correct daos.cfg is produced.

If daoscat.nsf is deleted, the server should not be operated until a DAOS resync has been performed. The safest way to do this is to run a standalone DAOS resync before the server is started.

On 8.5.2 and newer environments, use the following command to get to the minimum usable state before the server is operated:

RESYNC QUICK

This option will populate the DIT and DOI, but will not update the reference counts. All DAOS functionality will be completely operational after a RESYNC QUICK except for prune. Prune will not run until the reference counts have been updated by a normal resync, and the DAOS catalog is in Synchronized state.

Once the DAOS catalog is in Synchronized state, a fixup operation on an NSF will update the hint location of the DAOS tickets in the NSF with the current information from the DOI.


 

Is it ever OK to delete these files?

If all data is being restored (as in the case of a disaster recovery scenario) and all .NSF and .NLO data is being recovered from backup, the daos.cfg should be deleted, and an offline resync (or RESYNC QUICK at a minimum) should be performed to get an accurate accounting of all files and references. These two files should NOT be restored from the backup in this situation.

If either of these files were accidentally copied or restored from a different server or timeframe, or if either is corrupted beyond repair, deleting them (and recreating them as described above) is an acceptable response.

Other than that...you're better off leaving the existing files and doing a resync.



 
 

News List