Let me give a warm up before jumping into how this was understood.
A very generic introduction to the environment, Long story short -
- 2 databases in Environment
- 1 for live application, 1 for reporting
- live application database gets restored to the reporting database everyday before the business starts every day.
- Soon after the restore happened we started seeing log archival failures all of a sudden from no where
2016-02-11-06.22.26.070851-360 I100204A452 LEVEL: Warning
PID : 11469034 TID : 3678 PROC : db2sysc 0
INSTANCE: db2 NODE : 000
EDUID : 3678 EDUNAME: db2logmgr (SAMPLE) 0
FUNCTION: DB2 UDB, data protection services,
sqlpgRetryFailedArchive, probe:4760
MESSAGE : Unable to find log file 9543 for LOGARCHMETH1 using
method 4 and<
target . Assume log was archived and continue.
Few things/pattern we noticed with this issue are
- These errors started soon when the Rollforward phase of the restore started- quite obvious as the logging activity starts right from the Rollforward.
- Every time the archival failure is happening, db2logmgr is failing trying to find different log files to get them archived like below.
FUNCTION: DB2 UDB, data protection services,
sqlpgRetryFailedArchive, probe:4760
MESSAGE : Unable to find log file 9543 for LOGARCHMETH1 using
method 4 and<
target . Assume log was archived and continue.
FUNCTION: DB2 UDB, data protection services,
sqlpgRetryFailedArchive, probe:4760
MESSAGE : Unable to find log file 9544 for LOGARCHMETH1 using
method 4 and<
target . Assume log was archived and continue.
FUNCTION: DB2 UDB, data protection services,
sqlpgRetryFailedArchive, probe:4760
MESSAGE : Unable to find log file 9545 for LOGARCHMETH1 using
method 4 and<
target . Assume log was archived and continue.
FUNCTION: DB2 UDB, data protection services,
sqlpgRetryFailedArchive, probe:4760
MESSAGE : Unable to find log file 9546 for LOGARCHMETH1 using
method 4 and<
target . Assume log was archived and continue.
We did a force archive to see if this will any way sort the issue
db2 archive log for db SAMPLE
And it did resolved the issue and log archivals are getting completed successfully thereafter, but how long? till the next day restore.
Yes, this issue came up again in the next day daily restore, although the issue is being solved by doing a force archive we are more interested in RCA of this and raised an SR with IBM to find the dark secrets and internals of log manager and how and why is db2logmgr is assuming some log file which does not even exist to archive?
Initially IBM said that’s a warning message and be completely ignored, but we do have alerting system quite well integrated in the environment and serious though in place we decided to further work with IBM on why is this popping up rather than suppressing the alert.
So when further worked with IBM on this, there came some interesting conceptual facts of the db2logmgr.
Below are the set of Questions from us and answers from IBM specialist.
Question - How long will the below warning messages appear
Log Manger first calls sqlpgArchivePendingLogs to archive any log file
that is pending to be archived. If it fails to archive any of the log
files, the log file will be put into the archive array.
2015-12-10-06.19.59.222222-360 E86409A379 LEVEL: Info
PID : 11469034 TID : 3678 PROC : db2sysc 0
INSTANCE: db2 NODE : 000
EDUID : 3678 EDUNAME: db2logmgr (SAMPLE) 0
FUNCTION: DB2 UDB, data protection services, sqlpgArchiveLogFile,
probe:3108
1830 DATA #1 : <preformatted>
Started archive for log file S0009543.LOG.
2015-12-10-06.19.59.222549-360 E86789A391 LEVEL: Warning
PID : 11469034 TID : 3678 PROC : db2sysc 0
INSTANCE: db2 NODE : 000
EDUID : 3678 EDUNAME: db2logmgr (SAMPLE) 0
FUNCTION: DB2 UDB, data protection services, sqlpgArchiveLogFile,
probe:3150
MESSAGE : ADM1848W Failed to archive log file "S0009543.LOG" to ""
from "".
2015-12-10-06.19.59.222873-360 E87181A447 LEVEL: Warning
PID : 11469034 TID : 3678 PROC : db2sysc 0
INSTANCE: db2 NODE : 000
EDUID : 3678 EDUNAME: db2logmgr (SAMPLE) 0
FUNCTION: DB2 UDB, data protection services, sqlpgArchiveLogFile,
probe:3155
DATA #1 : <preformatted>
Failed to archive log file S0009543.LOG to from
/var/udb/logs1/db2/SAMPLE/ return code -2045837302.
2015-12-10-06.19.59.223014-360 I87629A385 LEVEL: Warning
PID : 11469034 TID : 3678 PROC : db2sysc 0
INSTANCE: db2 NODE : 000
EDUID : 3678 EDUNAME: db2logmgr (SAMPLE) 0
FUNCTION: DB2 UDB, data protection services,
sqlpgArchivePendingLogs, probe:1500
MESSAGE : Log archive failed with rc -2045837302 for LOGARCHMETH1.
Afterwards, Log Manager will scan the archive array through function
sqlpgScanArchArray and retry archiving any failed attempts. The time
between the first failed attempt and retry is determined by the config
parameter archretrydelay.
At this time, archiving is handled by function sqlpgRetryFailedArchive.
SQLO_FNEX will be returned if Log Manager could not find the file and
will assume the log file has already been archived. Therefore no further
archive attempts will be made.
2015-12-10-06.20.26.070851-360 I100204A452 LEVEL: Warning
PID : 11469034 TID : 3678 PROC : db2sysc 0
INSTANCE: db2 NODE : 000
EDUID : 3678 EDUNAME: db2logmgr (SAMPLE) 0
FUNCTION: DB2 UDB, data protection services,
sqlpgRetryFailedArchive, probe:4760
MESSAGE : Unable to find log file 9543 for LOGARCHMETH1 using
method 4 and<
target . Assume log was archived and continue.
Question - We noticed that once we issue force archive we see no warning popping up again – what I'm wondering here is how DB2 logmgr suddenly came to know that the LOG file is already existing.
Once Log Manager could not find the log files while it is in
sqlpgRetryFailedArchive, it will update firstFailedArch1 or
firstFailedArch2 (depends on which archive method you are using) to a
higher log extent number.
firstFailedArch is used for determining the first log file Log Manger
failed to archive. Later on when you forces the log files to be
archived, Log Manager will not check for the missing logs since they are
not pending to be archived and
firstFailedArch1 or firstFailedArch2 already moved up.
Thanks to that IBM Specialist who just revealed some interesting notes on how db2 internally handles the log array.
No comments:
Post a Comment