
A Simple Guide to Root Cause Analysis in Indonesian Oil and Gas Operations
Author: Jen Megah Bremanda Sembiring (Reliability Engineer)
If you manage operations, maintenance, or asset integrity for an upstream or downstream facility in Indonesia, you have almost certainly seen this pattern. A critical piece of equipment fails. Production halts. The team scrambles, repairs the hardware, and submits a Root Cause Analysis (RCA) report within the required timeframe. The report is filed. The equipment restarts. Three to six months later, the same failure recurs.
This is not an RCA problem. It is an RCA execution problem.
The stakes in Indonesian oil and gas are significant. Indonesia's national crude oil production dropped from 606,000 barrels per day (BOPD) in 2023 to 580,000 BOPD in 2024, with unplanned shutdowns across various operating areas cited as a key technical contributing factor [1].
The purpose of RCA is not to produce a document that satisfies a compliance requirement. It is to permanently eliminate the failure mechanism so the asset never fails the same way again [3]. That distinction changes everything about how the process must be managed.
Selecting the Right Tool for the Right Problem

Not every failure requires the same analytical approach. Understanding which tool to deploy, and when, is a core competency for any manager overseeing a reliability function.
Fishbone (Ishikawa) Diagrams are most effective at the start of an investigation, particularly when the failure is ambiguous and multidisciplinary input is needed. By structuring potential causes across categories such as equipment, materials, methods, environment, and people, the fishbone format forces teams to brainstorm broadly before narrowing down. This prevents premature conclusions driven by the loudest voice in the room [4].
5 Whys works well for operationally straightforward failure sequences where cause and effect are relatively linear. Asking "why" five consecutive times about a pump seal failure can efficiently trace the path from the observed symptom back to an operational or procedural deviation. The critical limitation of 5 Whys, however, is that it follows a single logical thread. For multi-variable failures, it can lead the team to an intermediate cause and stop there, creating a false sense of completion [5].
Fault Tree Analysis (FTA) is reserved for high-consequence, multi-variable events involving primary rotating or process equipment.🔒Contact us to learn about the methodology!
Critically, no tool is effective without validated data triangulation. A fishbone diagram built from memory and anecdote will mislead you. The investigation must integrate DCS sensor logs (temperature, pressure, and vibration trends in the hours and days before the event), the full maintenance history of the asset, and a physical inspection or metallurgical analysis of the failed component. As Baker Hughes has noted in its reliability guidance, incomplete access to historical failure data forces engineers to work in the dark and makes it nearly impossible to validate root causes with confidence [6].
Case Study 1: The Incomplete RCA
In October 2013, a gas-cooling heat exchanger at Pertamina Hulu Energi Offshore North West Java (PHE-ONWJ) experienced a leakage event involving 93 tubes, confirming that heat exchanger tube integrity failures are a realistic and documented failure mode in Indonesian offshore operations [7]. Scenarios like this, and the organizational responses that follow them, illustrate a pattern that reliability professionals across the industry recognize immediately.
When a similar tube bundle leak occurs at a condensate processing facility, the maintenance team typically identifies the failure mode correctly (tube wall thinning due to corrosion or thermal degradation) and replaces the affected bundle within the required operational window. The RCA report, submitted under deadline pressure, concludes that the root cause was "material degradation due to normal service wear." Corrective action: increase inspection frequency for that specific exchanger.
The investigation, however, stops far too early. The team does not cross-reference DCS process logs. They do not identify that inlet fluid temperature had been running above the design specification for weeks prior to the event. They do not investigate why the process temperature drifted. They do not discover that a control valve upstream had been manually bypassed during a previous maintenance window and never returned to automatic mode.

This pattern is well documented in reliability engineering literature. As industry analysts have observed, teams under time pressure identify a superficial cause and stop there, leading to symptom fixes rather than addressing the systemic or latent root causes [5]. Concluding that a component failed due to wear, without investigating why the operating conditions enabled that wear to accelerate, is not an RCA. It is a maintenance report with extra steps.
The result is predictable. The same exchanger suffers a second tube leak under identical process conditions in the next operating cycle. The team addressed the symptom while the actual root cause, the uncontrolled process temperature from an unrestored control loop, remained entirely intact. The corrective maintenance budget absorbs the cost twice, and the production loss is booked twice [8].
Case Study 2: The Closed-Loop RCA
Centrifugal pumps are among the most widely deployed rotating assets in oil and gas processing facilities, performing critical services including crude transfer, produced water handling, cooling water circulation, and chemical injection [9]. A study conducted at PT Kilang Pertamina Internasional RU II Dumai specifically identified shaft misalignment (Risk Priority Number of 324) and lubrication failure (RPN of 160) as the two highest-priority failure modes in centrifugal pump analysis using the FMEA method, confirming that the most consequential failures often trace back to procedural and maintenance management gaps rather than material defects [10].
A closed-loop RCA applied to a high-vibration trip on a centrifugal compressor or gas turbine follows a fundamentally different logic. When the reliability team refuses to stop at the physical failure and instead applies FTA paired with vibration spectrum analysis, a different picture emerges. Physical inspection of the rotor identifies damage to balance weights. Metallurgical analysis confirms fatigue cracking, not simple wear. The team then cross-references the maintenance history.
Research on fault diagnosis in gas turbines confirms that vibrations and bearing failures in rotating machinery are frequently the final expression of upstream process issues: poor lubrication resulting from insufficient oil supply, substandard lubricants, or clogged filters; and misalignment introduced during maintenance activities [11]. When the RCA team reviews the maintenance handover documentation from the last turnaround, they find that torque specifications for the balance weight fasteners were not recorded in the work order closure package. The technician applied a legacy torque value from an older compressor model rather than the OEM specification for this unit.
The root cause is not rotor fatigue. It is an inadequate maintenance handover procedure that allowed a critical parameter to be applied incorrectly without any checkpoint to catch it.
This distinction matters enormously for the corrective action. The permanent solution addresses the system, not the component: revision of the standard work order template to make OEM fastener torque specifications a mandatory, signed-off field; mandatory cross-check between outgoing and incoming craft supervisors during handover; and integration of compressor-specific technical data sheets into the contractor onboarding package for all future turnarounds.
As reliability engineering practitioners have consistently found, the investigation must go beyond the visible failure to the underlying mechanism. RCA identifies the latent organizational or process cause, not just the mechanical failure that expressed it [8]. When this is done correctly, the asset does not fail the same way again.
Three Things Managers Must Do Differently
Build a no-blame investigation culture from the top down.
Operators and technicians who fear punitive consequences will withhold the operational context that matters most. When people feel safe reporting bypassed procedures, skipped steps, or unclear work instructions, you reach the real root cause. As reliability culture experts have noted, every RCA should begin with a clear statement that the goal is to learn, not to assign blame, and investigations must be structurally separated from disciplinary processes [4]. Human error is an outcome, not a cause. Ask why the error was possible.
Require data validation before any RCA report is closed.
Implement a non-negotiable gate in your reporting workflow: no corrective action can be marked complete until the team has confirmed that the proposed root cause is supported by at least two independent data sources, whether DCS trend data, maintenance records, third-party inspection reports, or physical evidence. Opinions without data belong in discussions, not in formal reports that will drive capital decisions [6]. This single requirement eliminates the majority of incomplete analyses.
Track solution effectiveness over a defined timeline.
An RCA report without a structured follow-up verification step is an unfinished process. Assign a responsible engineer to confirm, at a defined interval (typically 6 to 12 months post-implementation), that the failure mechanism has not recurred and that the corrective action remains in force. If repeat incidents occur, it means something was missed: go back, reassess, and get fresh eyes on the analysis [5]. This closes the loop and transforms RCA from a one-time event into a functional reliability management system.
Conclusion
Indonesia's upstream oil and gas sector is operating under significant production pressure. SKK Migas has set an ambitious target of one million barrels of oil per day by 2030 under the Indonesia Oil and Gas Strategic Plan (IOG 4.0) [1], a target that will remain out of reach if primary equipment continues to fail for reasons that were identified, documented, and filed without being permanently resolved.
Asset reliability is not built by filling out forms faster. It is built by managers who insist that every investigation runs deep enough to find the system that permitted the failure, not just the component that expressed it. The tools are available. The discipline to use them fully is the variable that separates operations that improve from those that repeat the same failures indefinitely.
Cliste Rekayasa Indonesia partners with oil, gas, and manufacturing companies to build RCA programs that go beyond compliance paperwork. We help management teams select the right investigative tools for each failure type, structure investigations that reach true systemic root causes, and put closed-loop verification systems in place so that corrective actions are tracked, validated, and proven effective over time. If recurring failures are still draining your operational budget, the problem is not your equipment. It is the investigation process behind it. Talk to our reliability engineering team about where your current RCA program is stopping short, and what it would take to stop the same failures from coming back.
Let’s Build a More Reliable Future.
Author: Jen Megah Bremanda Sembiring (Reliability Engineer)
References
[1] PricewaterhouseCoopers Indonesia. (2025). Oil and Gas in Indonesia: Investment, Taxation and Regulatory Guide, 14th Edition. PwC. Retrieved from https://www.pwc.com/id/en/energy-utilities-mining/assets/oil-gas-guide-2025.pdf
[2] VOI.id. (2024, August 5). SKK Migas Together With KKKS Focus On Increasing Production Amid Heavy Challenges 2024. Retrieved from https://voi.id/en/economy/405046
[3] Sologic. (n.d.). Root Cause Analysis for Oil and Gas. Retrieved from https://www.sologic.com/en-us/about/sectors/oil-gas-industry
[4] ReliaMag. (2025, November 1). How Root Cause Analysis Process Improvement Strengthens Reliability Culture. Retrieved from https://reliamag.com/cartoons/root-cause-analysis-process-improvement/
[5] Excellence Integrity Management. (2025, March 15). Root Cause Analysis in Process Industries: A Comprehensive Guide. Retrieved from https://excellenceintegrity.com/rca-in-process-industries/
[6] Baker Hughes Cordant. (n.d.). Avoid the Biggest Failures in Root Cause Analysis. Retrieved from https://www.bakerhughes.com/cordant/blog/avoid-biggest-failures-root-cause-analysis
[7] ResearchGate / Journal Publication. (2023). Quantitative Study of Risk Based Inspection (RBI) Using API 581 on Heat Exchanger Tube Bundle [referencing PHE-ONWJ 2013 incident]. Retrieved from https://www.researchgate.net/publication/376978415
[8] AsInt, Inc. (2026, January 23). Why Root Cause Analysis (RCA) Keeps Failing, And How Digital RCA Breaks the Cycle. Retrieved from https://asint.net/why-root-cause-analysis-rca-keeps-failing-and-how-digital-rca-breaks-the-cycle/
[9] RSIS International. (2026, February). Bearing Failure Analysis and Reliability Improvement of Centrifugal Pumps in Oil and Gas Facilities. Retrieved from https://rsisinternational.org/journals/ijriss/uploads/vol10-iss2-pg2406-2420-202602_pdf.pdf
[10] Journal of Technology and Vocational Studies / Institut Teknologi Padang. (2025). Failure Analysis of Centrifugal Pump (Overhung 212-P17) Using FMEA Method at PT Kilang Pertamina Internasional RU II Dumai. Vol. 3(2), pp. 173-178. https://doi.org/10.21063/jtv.2025.3.2.173-178
[11] ResearchGate. (2025). A Review on Fault Diagnosis Methods of Gas Turbine. Retrieved from https://www.researchgate.net/publication/398368741_A_Review_on_Fault_Diagnosis_Methods_of_Gas_Turbine
Discover more insights

Why 2026 Carbon Tax Compliance Now Sits on the Reliability Engineer's Desk
.png&w=3840&q=75)
Good Oil & Gas Companies Fix Failures. Great Ones Make Sure They Never Happen Again.

Why a Structured BOM is the Ultimate Reliability Multiplier in Oil & Gas
Explore More
Discover deeper perspectives and insights.
Explore more insights
Discover deeper perspectives from our experts.