Extract process of oracle goldengate 26ai read&learn gg

Photo by James Baltz on Unsplash
The extract process in oracle golden gate microservices architecture extracts or captures the transactions from the source database to replay in the target database. It consists mainly of two kinds Initial Load and Online Extract.
Initial Load : When GoldenGate is setup for initial load , the extract process will capture the static data present in the source tables. However one can use oracle tools like datapump and rman to setup the target database , goldengate comes with its own process called initial load . We will discuss more in another blog
Online Extract : When you set up extract to just capture the transactions(DDL and DML) from source after initial load. In this mode extract will store transactions and as soon as it receives a commit it will store them to disk in “trail files”. Distribution servive transfers the trail files to the target system for replay. All trail files have a sequentially numbered order ( ex named as ea000000001 , ea000000002 , etc ) and are created in the order of commit. Pro Tip : If you performed your rman or datapump backup till a certain SCN number , start your online extract from that SCN.
Once ready, add an extract and make sure to follow the checklist for following pre-requistes -> Adding a database connection , checkpoint table , heartbeat table and trandata . ( Future blog posts will cover these prerequisites in detail. )
Here is a sample that i would run from the admin client
ADD EXTRACT extsource,INTEGRATED TRANLOG, BEGIN NOW
REGISTER EXTRACT extsource DATABASE
ADD EXTTRAIL ESOURCE/ea, EXTRACT extsource, MEGABYTES 500
Lets explain each command one by one :-
extsource is the name of the extract we have kept , you can also name it as extdbname
INTEGRATED TRANLOG , is an advanced capture method which interacts with logmining server running in the database , mining LCR ( Logical Change Records ) from the redo logs. LCRs are written to the trail files. An alternate option to Integrated Tranlog is just “Tranlog” which captures data changes by reading directly from redo logs in oracle , binlogs in mysql and wal logs in postgres.
BEGIN NOW , This sets the SCN to current timestamp and starts capturing transactions from there. It ignores older transactions in redo logs.
REGISTER EXTRACT extsource DATABASE , This command establishes connection between goldengate extract process and source oracle database. Incase of using INTEGRATED TRANLOG , it instructs the oracle db to start a logmining server to read the LCR . It prevents archivelogs from getting deleted until extract process reads them. If you’re running a CDB ( container database ) it allows single extract to read transactions from multiple PDB’s ( Pluggable Database)
ADD EXTTRAIL ESOURCE/ea, EXTRACT extsource, MEGABYTES 500 : It registers a directory called ESOURCE in the $DEPLOYMENT_HOME/var/lib/data and each trail file starts with prefix ea . GoldenGate maps these trail files to the extsource Extract and limits each file to a maximum size of 500 MB.. In case of a busy database you can keep 2gb trail file size.
Now lets discuss about the parameter file , here is a sample
extract extsource
USERIDALIAS admin DOMAIN OracleGoldenGate
EXTTRAIL ESOURCE/ea
TRANLOGOPTIONS EXCLUDEUSER admin
--IE parameters
TRANLOGOPTIONS EXCLUDEUSER SYSTEM
TRANLOGOPTIONS EXCLUDEUSER SYS
TRANLOGOPTIONS EXCLUDETAG 00
REPORTCOUNT EVERY 10 MINUTES, RATE
STATOPTIONS REPORTFETCH, RESETREPORTSTATS
REPORT AT 02:00
REPORTROLLOVER AT 02:00
DISCARDROLLOVER at 00:00
BR BRDIR /ogg_home/var/run/BR
CACHEMGR CACHESIZE 2GB, CACHEDIRECTORY /deploy_source/var/lib/data/CACHEDIRECTORY
TABLE SAMPLE.EMPLOYEE;
TABLE SAMPLE.ADDRESS;
USERIDALIAS admin DOMAIN OracleGoldenGate , this directs goldengate to connect to the database using the credentials securely stored in goldengate wallet by the alias name admin
TRANLOGOPTIONS EXCLUDEUSER SYSTEM,SYS , instructs goldengate to skip and transaction executed by system or sys user . Use case is to allow admins to run ddl or dmls which they dont want to flow downstream.
TRANLOGOPTIONS EXCLUDETAG 00 ,useful parameter of bidirectional replication . It prevents data loop telling target extract to skip transactions applied by target replicat to the database.
REPORTCOUNT EVERY 10 MINUTES, RATE , every 10mins extract will write status update to .rpt file ( $Deploy_home/var/lib/report) , how many records it has processed and at what rate per second
STATOPTIONS REPORTFETCH, RESETREPORTSTATS , REPORTFETCH is useful for commands like “stats extract extsource” , REPORTFETCH will give reports amount of fetch performed by extract. RESETREPORTSTATS will reset the report for each period so that you don’t get a cumulative report , instead you get report by hourly or daily separately.
REPORT AT 02:00 , REPORTROLLOVER AT 02:00 , This will daily generate statistics reports at 2AM and make sure to close the current report at 2AM and rolls over to a new report.
DISCARDROLLOVER at 00:00 , The discard files ( report of rows discarded from replication due to any issues ) are rolled over at midnight.
BR BRDIR $OGG_HOME/var/run/BR , this is a bounded recovery parameter , if your database has long running transactions and your extract crashes for any reason. Extract doesn’t have to go back and re read all open transactions , instead it checkpoints it to this directory and saves hours in recovery . Its not needed if your database has huge memory.
CACHEMGR CACHESIZE 2GB, CACHEDIRECTORY /deploy_source/var/lib/data/CACHEDIRECTORY , this allocates an artificial cache of 2Gb ( you can modify this value to your own need ) . If the transcations exceeds 2Gb RAM then it will safely page out to this disk location. This parameter is there to reduce I/O bottleneck.
Ref: https://docs.oracle.com/en/database/goldengate/core/26/index.html