Oracle Database

Automatic Trail File Recovery goldengate 26ai read&learn gg

Photo by AarΓ³n Blanco Tejedor on Unsplash

ERROR OGG-02171 Error reading LCR from data source. Status 500, data source type TrailDataSource.
ERROR OGG-01733 Trail file header file size value 15114838 for trail file '/deploy_target/var/lib/data/DBTARGET/re000000008' differs from the actual size of the file (15121254).

The target site’s Replicat process abended with the error above.. Usual restarts of extract , replicat and distpath didn’t fix the issue . A corrupt trail file re000000008 stuck the target Replicat. Cause of this error is a sudden crash of the database host. The Fix : Automatic Trail File Recover

A little theory , Automatic trail file recovery feature was 1st introduced in Oracle Goldengate12c in both classic and microservices architecture. If a trail file on the target system becomes corrupt , damaged or is missing ,you need to delete it and all its subsequent trail files and restart the distribution path on source. On restarting the distribution path , it automatically detects the missing sequences on target . It then uses checkpoint information and source trail files (generated by extract) to rebuild the trail files on target.

Pro Tips : 
Instead of deleting the trail files make a backup folder and move it there .
On the source Either keep PURGEOLDEXTRACTS MINKEEP  to a higher value or create purge tasks on web UI to greater than a day
Don't use NOFILTERDUPTRANSACTIONS on target replicat while starting up , it may allow duplicate entries.
You may need to restart source distribution path more than once and wait.

Simple demo as we discussed in the theory

#Target System
>cd $DEPLOY_HOME/var/lib/data/SOURCEDB
>ls -rlth
-rw-r----- 1 himmy oracle 2.0M May 31 12:24 re000000003
-rw-r----- 1 himmy oracle 2.0M Jun  1 06:31 re000000004
-rw-r----- 1 himmy oracle 2.0M Jun  1 16:09 re000000005
-rw-r----- 1 himmy oracle 2.0M Jun  2 09:22 re000000006
-rw-r----- 1 himmy oracle 2.0M Jun  3 01:14 re000000007
-rw-r----- 1 himmy oracle 1.0M Jun  3 13:28 re000000008
-rw-r----- 1 himmy oracle 2.0M Jun  3 22:23 re000000009
-rw-r----- 1 himmy oracle 2.0M Jun  3 23:12 re000000010

#Source System
>$GG_HOME/bin/adminclient
>stop distpath <distpathname>

#Target System

> mkdir bkp_trail
> mv re000000008 bkp_trail
> mv re000000009 bkp_trail
> mv re000000010 bkp_trail

#Source System
>start distpath <distpathname>

As visible from above , the trail file re000000008 was formed of size 1Mb . It was being written when the server crashed , the rest sequences copied just fine. Stopped the distptath on the source system , moved the corrupted trail file to a backup location and started the distpath on the source system.

Automatic trail file recovery kicks in and will start rebuild the re000000008 target trail file and its subsequent trails.

#Pro Tip:
If trail file recovery is not started then try restarting DISTSRVR on source and RECVSRVR service on target
Syntax: 
> stop service DISTSRVR 
> start service DISTSRVR 

Warning : Before starting automatic trail file recovery be mindful if source trail files are present or not . If any doubt please engage oracle support services.

Ref : https://docs.oracle.com/en/database/goldengate/big-data/26/

Leave a Reply