Automatic Trail File Recovery goldengate 26ai read&learn gg

Photo by AarΓ³n Blanco Tejedor on Unsplash
ERROR OGG-02171 Error reading LCR from data source. Status 500, data source type TrailDataSource.
ERROR OGG-01733 Trail file header file size value 15114838 for trail file '/deploy_target/var/lib/data/DBTARGET/re000000008' differs from the actual size of the file (15121254).
The target site’s Replicat process abended with the error above.. Usual restarts of extract , replicat and distpath didn’t fix the issue . A corrupt trail file re000000008 stuck the target Replicat. Cause of this error is a sudden crash of the database host. The Fix : Automatic Trail File Recover
A little theory , Automatic trail file recovery feature was 1st introduced in Oracle Goldengate12c in both classic and microservices architecture. If a trail file on the target system becomes corrupt , damaged or is missing ,you need to delete it and all its subsequent trail files and restart the distribution path on source. On restarting the distribution path , it automatically detects the missing sequences on target . It then uses checkpoint information and source trail files (generated by extract) to rebuild the trail files on target.
Pro Tips :
Instead of deleting the trail files make a backup folder and move it there .
On the source Either keep PURGEOLDEXTRACTS MINKEEP to a higher value or create purge tasks on web UI to greater than a day
Don't use NOFILTERDUPTRANSACTIONS on target replicat while starting up , it may allow duplicate entries.
You may need to restart source distribution path more than once and wait.
Simple demo as we discussed in the theory
#Target System
>cd $DEPLOY_HOME/var/lib/data/SOURCEDB
>ls -rlth
-rw-r----- 1 himmy oracle 2.0M May 31 12:24 re000000003
-rw-r----- 1 himmy oracle 2.0M Jun 1 06:31 re000000004
-rw-r----- 1 himmy oracle 2.0M Jun 1 16:09 re000000005
-rw-r----- 1 himmy oracle 2.0M Jun 2 09:22 re000000006
-rw-r----- 1 himmy oracle 2.0M Jun 3 01:14 re000000007
-rw-r----- 1 himmy oracle 1.0M Jun 3 13:28 re000000008
-rw-r----- 1 himmy oracle 2.0M Jun 3 22:23 re000000009
-rw-r----- 1 himmy oracle 2.0M Jun 3 23:12 re000000010
#Source System
>$GG_HOME/bin/adminclient
>stop distpath <distpathname>
#Target System
> mkdir bkp_trail
> mv re000000008 bkp_trail
> mv re000000009 bkp_trail
> mv re000000010 bkp_trail
#Source System
>start distpath <distpathname>
As visible from above , the trail file re000000008 was formed of size 1Mb . It was being written when the server crashed , the rest sequences copied just fine. Stopped the distptath on the source system , moved the corrupted trail file to a backup location and started the distpath on the source system.
Automatic trail file recovery kicks in and will start rebuild the re000000008 target trail file and its subsequent trails.
#Pro Tip:
If trail file recovery is not started then try restarting DISTSRVR on source and RECVSRVR service on target
Syntax:
> stop service DISTSRVR
> start service DISTSRVR
Warning : Before starting automatic trail file recovery be mindful if source trail files are present or not . If any doubt please engage oracle support services.
Ref : https://docs.oracle.com/en/database/goldengate/big-data/26/