Lay Summary Administrative data collected by healthcare organizations at the time of enrollment and during patient care are not always readily available for research. The same data type (eg, hospital admission) may come from multiple data sources in various formats and with inconsistent values, and the change of source data systems over time may leave the data fragmented. In this paper, we described the contents, development, maintenance methodology, and other aspects of a research data warehouse within a large integrated healthcare system, Kaiser Permanente Southern California. We also demonstrated the application of the data in the RDW and the volume of data that can be used for various population-based research projects. With a volume of 105 million person-years of health plan enrollment in 1981-2018 (30 million for Hispanic and 10 million for African American and 7 million for Asian patients), about 19 million clinic/emergency room visits, and more than 200k hospital admissions per year, the research data warehouse offers the opportunity to conduct high-quality population-based research studies. Background Electronic health records and many legacy systems contain rich longitudinal data that can be used for research; however, they typically are not readily available. Materials and methods At Kaiser Permanente Southern California (KPSC), a research data warehouse (RDW) has been developed and maintained since the late 1990s and widely extended in 2006, aggregating and standardizing data collected from internal and a few external sources. This article provides a high-level overview of the RDW and discusses challenges common to data warehouses or repositories for research use. To demonstrate the application of the data, we report the volume, patient characteristics, and age-adjusted prevalence of selected medical conditions and utilization rates of selected medical procedures. Results A total of 105 million person-years of health plan enrollment was recorded in the RDW between 1981 and 2018, with most healthcare utilization data available since early or middle 1990s. Among active enrollees on December 31, 2018, 15% were >= 65 years of age, 33.9% were non-Hispanic white, 43.3% Hispanic, 11.0% Asian, and 8.4% African American, and 34.4% of children (2-17 years old) and 72.1% of adults (>= 18 years old) were overweight or obese. The age-adjusted prevalence of asthma, atrial fibrillation, diabetes mellitus, hypercholesteremia, and hypertension increased between 2001 and 2018. Hospitalization and Emergency Department (ED) visit rates appeared lower, and office visit rates seemed higher at KPSC compared to the reported US averages. Discussion and conclusion Although the RDW is unique to KPSC, its methodologies and experience may provide useful insights for researchers of other healthcare systems worldwide in the era of big data analysis.