Retail Pro Prism Replication Troubleshooting - Common Issues

This is a table of replication troubleshooting provided from the 2022 Retail Pro Prism 2 Workshop.

ISSUEPOTENTIAL ROOT CAUSEIDENTIFYSOLUTION
Missing/stuck dataInitialization process is running/stuck
  • Look at the Connection Manager to identify where data is (sent?, processing?) and verify the producer /  consumer cache tables for data
  • Check the replication status table on the store. See if the init session is in progress, paused, canceled
  • If data is still moving & processing, you will need to wait.  Init takes priority over D2D.
  • If data isn't moving/processing, determine why if possible.
  • Cancel initialization if paused or can't resume.
Missing/stuck dataError - constraint or potentially DB optimistic lock, if exceeds defined retry count
  • Enable log level 3 and resend document to capture additional details in order to determine what exactly is the constraint.
  • Possible server performance issue
  • Optimistic lock is usually the same record sent more than once (ex: same customer being resent over and over)
Correct issue on document or resend data
Missing/stuck dataBacklog of data on the POA/RIL producer tablesIdentify root cause: Performance, init session in progress, locked queue cycling.
  • Depends on root cause.
  • If locking queue is the issue, then address this. Avoiding this is key.
  • Init session in progress or stuck. See comments on init priority and issues with stuck queues above.
Missing/stuck dataMalformed custom JSON file (integrations)
  • Identify in the PrismMQ logs
Example: !Error | Data was not readable, likely a serializer error. Cannot report replication status details
Contact the developer.
Missing/stuck dataDB Data file size capacity (OS file size limit)Seen in DB logs (Oracle alert_rproods.log file) and possibly in PrismMQ logs.Add additional data file (see RIL TTK)
Missing/stuck dataRabbitMQ Mnesia DB files corruptLogging into the RMQ management console gives an error, even after restarting RabbitMQ service.
  • Delete the RabbitMQ queues in the queue folder until you find the corrupt queue/queues.
  • C:\ProgramData\RetailPro\Server\RabbitMQ\db\

rabbit@"hostname"mnesia

\msg_stores\vhosts\628WB

79CIFDYO9LJI6DKMI09L\queues
Process out of memoryMemory limitation: 32bit memory address limits - 1.8+ GB maxTask manager (details - peak or current memory usage), noting memory usage for PMQ processesDepending on what process. Restart service in most cases.
Process out of memory
  • Customer UDF
  • Extremely large document
Memory limit: Identify message size in producer cache/consumer cache tableReduce number of consuming threads. Don't send the offending data.
RabbitMQ Lost connections (repeatedly). Possible lost messages or initialization failures.
  • Known issue (fixed in the  latest release of Retail Pro Prism 1.14.7)
  • RabbitMQ queue setup: Heartbeat check is out of sync with connection timeout.
  • Seen in RMQ logs every couple seconds. This occurs over and over on all the connected systems.
  • Client unexpectedly closed TCP connection
Upgrade to latest Retail Pro Prism 1.14.7.2153 or later.
Preferences overwrittenCore resources replicated from store to POA
  • New store was published before joining the enterprise.
  • Changes made at store replicate to the POA.
  • Scheduler will trigger core resources to be sent with some tasks (Update active season).
  • Retail Pro Prism 2.1 release has the ability to turn off core resources (in PMQ config file)
  • Disable scheduler tasks "update active season" (set active = 0) on all store servers (ideally before joining the enterprise)
  • Clean out producer_cache before joining the enterprise.
Data stuck in RabbitMQ on the sending sideFirewallSee if you can establish telnet or verify that ports are open.Correct firewall setup
Data stuck in RabbitMQ on the sending sideStore server networking issuePing or attempt to establish any connection to the store/receiving serverCorrect network issue
Join Enterprise Error - Invalid controller DataInvalid controller data (restoring or reinstalling a system previously joined)Likely the controller table has a record of this system with a different SID or same SID and controller ID. Other possible issues could also exist (see KB). KB: Resolving Invalid Controller Data Error in RP Prism's Enterprise Manager
Join Enterprise Error - Invalid controller DataInit of core resources failed/stuck
  • Join fails or gets stuck on the last step where it is initializing the core resources.
  • Checking the replication_status table and producer/consumer cache tables to determine if data is really stuck or has completed and is just missing end of init message.
Kill the TTK session and clean up the init session if it remains.  Then initialize the core resources manually.