# MS Fabric Data Governance Framework — Enterprise Healthcare

**Author:** Conductor (Research + Synthesis)  
**Date:** 2026-05-05  
**Context:** Large hospital network, HIPAA compliance, multi-department rollout

---

## Executive Summary

This framework provides actionable guidance for **Lakehouse structure, naming conventions, Power BI templates, workspace architecture, and permissions** within the context of healthcare's regulatory and operational requirements.

**Key Principles:**
1. Medallion architecture (Bronze → Silver → Gold) for data lineage and compliance
2. Role-based access control (RBAC) aligned to clinical/administrative separation
3. Naming conventions that embed context, environment, and layer information
4. Template standardization to reduce shadow analytics and governance drift
5. HIPAA compliance through field-level masking, row-level security (RLS), and audit trails

---

## Part 1: Lakehouse Structure (Medallion Architecture)

### 1.1 Three-Layer Design

```
┌─────────────────────────────────────────────┐
│         GOLD (Analytics/Reporting)          │
│  - Dimensional models, aggregates           │
│  - Optimized for dashboards, Power BI       │
│  - Sensitive data masked at column level    │
└─────────────────────────────────────────────┘
                    ↑
┌─────────────────────────────────────────────┐
│      SILVER (Trusted/Curated Layer)         │
│  - Integrated, cleansed, enriched           │
│  - Business logic applied                   │
│  - Lineage tracked via Purview              │
│  - PII marked with sensitivity labels       │
└─────────────────────────────────────────────┘
                    ↑
┌─────────────────────────────────────────────┐
│      BRONZE (Raw/Landing Zone)              │
│  - Source system extracts (EHR, lab, etc.)  │
│  - Minimal transformation                   │
│  - Audit-ready schema (timestamps, etc.)    │
└─────────────────────────────────────────────┘
```

### 1.2 Lakehouse Naming & Structure

Create **one primary lakehouse per clinical domain**, then sub-divide by medallion layer.

#### Structure Template:

```
HealthSystem_Lakehouse  (primary container)
├── bronze/
│   ├── ehr/                     # EHR raw exports
│   │   ├── patients/
│   │   ├── encounters/
│   │   ├── diagnoses/
│   │   └── procedures/
│   ├── lab_results/             # Lab system raw data
│   ├── pharmacy/                # Pharmacy raw data
│   └── _metadata/               # Ingestion timestamps, row counts
│
├── silver/
│   ├── ehr_integrated/          # Conformed dim/fact tables
│   │   ├── dim_patient/
│   │   ├── dim_provider/
│   │   ├── fact_encounters/
│   │   ├── fact_diagnoses/
│   │   └── fact_lab_results/
│   ├── clinical_calendars/      # Reference data
│   └── _quality_metrics/        # Data completeness, freshness
│
└── gold/
    ├── analytics/               # Optimized for BI
    │   ├── patient_demographics/
    │   ├── clinical_outcomes/
    │   ├── provider_performance/
    │   └── financial_metrics/
    ├── reporting/               # Pre-aggregated cubes
    └── secure_subsets/          # RLS-protected views
```

### 1.3 Lakehouse Permissions (Role-Based)

| Role | Permission | Use Case |
|------|-----------|----------|
| **Data Engineer** | Read/Write Bronze, Silver | Ingestion, transformation, quality checks |
| **Data Steward** | Read Silver, manage Gold | Lineage approval, metadata governance |
| **Analyst/BI Dev** | Read Silver/Gold, create reports | Dashboard creation, ad hoc queries |
| **Clinical User** | Read Gold only (RLS applied) | Dashboard consumption, no raw data |
| **Compliance Officer** | Read-only audit logs, lineage | Regulatory review |

---

## Part 2: Naming Conventions

### 2.1 Core Framework (6-Parameter)

Use **snake_case** throughout (not camelCase for Fabric objects—helps with CLI tooling and cross-platform consistency).

#### Pattern:
```
[environment]_[domain]_[component_type]_[medallion_layer]_[purpose]
```

#### Parameters:

| Parameter | Values | Example |
|-----------|--------|---------|
| **environment** | `prod`, `qa`, `dev` | `prod` |
| **domain** | `ehr`, `lab`, `pharm`, `fin` | `ehr` |
| **component_type** | `lh` (lakehouse), `wh` (warehouse), `ppl` (pipeline), `ds` (dataset) | `lh` |
| **medallion_layer** | `bronze`, `silver`, `gold` | `silver` |
| **purpose** | Concise descriptor | `patients`, `encounter_facts`, `cohort_analysis` |

#### Examples:

```
prod_ehr_lh_bronze_raw_extracts
prod_ehr_lh_silver_integrated
prod_ehr_lh_gold_analytics
qa_lab_lh_silver_quality_checks
dev_pharm_ppl_ingestion_delta
prod_fin_ds_revenue_cycle
```

### 2.2 Tables & Columns

#### Table Naming (within Lakehouse):

```
[layer]_[domain]_[entity]_[version]

Bronze:   bronze_ehr_patients_v1
Silver:   silver_ehr_dim_patient
Gold:     gold_ehr_fact_encounters

# Dimension tables
silver_ehr_dim_provider
silver_ehr_dim_facility

# Fact tables
silver_ehr_fact_encounters
silver_ehr_fact_diagnoses
```

#### Column Naming:

- **Keep snake_case** (consistency with Python/dbt)
- **Prefix PII columns:** `pii_` or mark via Microsoft Purview sensitivity labels
- **Timestamp columns:** `_created_at`, `_updated_at`, `_valid_from`, `_valid_to` (for SCD Type 2)
- **Surrogate keys:** `pk_patient_id` (clarity on primary keys)
- **Foreign keys:** `fk_provider_id` (clarity on references)

#### Example Schema:

```sql
-- Silver layer example
CREATE TABLE silver_ehr_dim_patient (
    pk_patient_id BIGINT PRIMARY KEY,
    pii_ssn STRING MASKED WITH (FUNCTION = 'partial(0,"XXX-XX-",4)'),
    pii_first_name STRING,
    pii_last_name STRING,
    patient_age_group STRING,
    facility_id INT,
    _created_at TIMESTAMP,
    _updated_at TIMESTAMP,
    _valid_from TIMESTAMP,
    _valid_to TIMESTAMP
);
```

### 2.3 Sensitive Data Marking

**All PII in Silver/Gold must be tagged in Purview:**

- **High Sensitivity:** SSN, medical record #, diagnosis codes tied to identifiers
- **Medium Sensitivity:** Name, DOB, address (if tied to patient record)
- **Low Sensitivity:** Age group, facility name, general demographics

Tag these in Microsoft Purview during Silver layer creation, so column-level RLS + masking apply automatically in Gold.

---

## Part 3: Power BI Templates & Standardization

### 3.1 Template Architecture

Create **one core template per reporting domain** to prevent shadow BI and ensure compliance.

#### Template Checklist:

| Component | Standard | Notes |
|-----------|----------|-------|
| **Semantic Model** | Single, shared across org | Minimize duplicate datasets |
| **Naming** | `[domain]_semantic_model` | e.g., `clinical_outcomes_semantic_model` |
| **RLS Rules** | Pre-configured by clinical role | Clinician sees own facility/unit only |
| **Measures** | DAX library, versioned in Git | Reusable, auditable formulas |
| **Color Palette** | Hospital brand standards | Accessibility (colorblind-safe) |
| **Layouts** | 1200px width, 16:9 ratio | Consistent across devices |

### 3.2 Power BI Dataset Template Structure

```
Semantic Model: prod_clinical_outcomes_semantic
├── Tables (mapped to Silver layer)
│   ├── dim_patient
│   ├── dim_provider
│   ├── dim_facility
│   ├── dim_time
│   └── fact_encounters
│
├── Measures (DAX)
│   ├── [Total_Encounters] = COUNTA(fact_encounters[encounter_id])
│   ├── [Avg_LOS] = AVERAGE(fact_encounters[length_of_stay_days])
│   ├── [Readmission_Rate] = ...
│   └── [Provider_Quality_Score] = ...
│
├── Roles (RLS)
│   ├── [Clinical_Staff] - See only own facility
│   ├── [Department_Lead] - See own department
│   ├── [Hospital_Admin] - See all facilities
│   └── [Analyst] - See gold layer only (no patient detail)
│
└── Metadata
    ├── Certified = TRUE
    ├── Owner = Data Steward (clinical_stewards@hospital.org)
    ├── Refresh_Frequency = Daily 2 AM UTC
    └── Last_Validated = [date]
```

### 3.3 Row-Level Security (RLS) Configuration

#### Example RLS Rule (for Clinical Staff):

```DAX
[FacilityID] = USERNAME()  # Maps user to facility
```

**More Granular** (user-to-unit mapping via table):

```DAX
[DepartmentID] IN CALCULATETABLE(
    VALUES(ProviderMapping[DepartmentID]),
    ProviderMapping[ADName] = USERNAME()
)
```

### 3.4 Template Report Pages (Mandatory Standard)

Every clinical reporting template must include:

1. **Overview Dashboard** — KPIs, key metrics, no drill-down
2. **Detailed Analytics** — Filtered by user's facility/unit (RLS applied)
3. **Trending & Benchmarks** — Compare own performance to peers
4. **Data Quality / Lineage** — Link to Purview asset info
5. **Help & Definitions** — Glossary, metric definitions

---

## Part 4: Workspace Architecture & Permissions

### 4.1 Workspace Naming Scheme

```
[environment]_[domain]_[purpose]_workspace

Examples:
prod_clinical_analytics_workspace
prod_financial_reporting_workspace
qa_lab_analytics_workspace
dev_poc_workspace
```

### 4.2 Workspace Tier Assignment

| Workspace Type | Capacity | Governance | Access |
|---|---|---|---|
| **Prod Analytics** | Premium / Dedicated | Strict (CI/CD, change log) | Analyst + steward approval |
| **Prod Reporting** | Premium | Strict (templates only) | Published reports, RLS enforced |
| **QA/Testing** | Standard | Medium (testing allowed) | Data engineers + analysts |
| **Dev/Sandbox** | Standard | Light | Limited to developer group |

### 4.3 Workspace Role Model

Create **security groups** in Azure AD, then assign to Fabric workspaces:

```
HealthSys-Fabric-Admins                    → Workspace Admin
HealthSys-Clinical-Analytics-Owners        → Workspace Admin (data stewards)
HealthSys-Clinical-Analytics-Contributors  → Contributor (data engineers, analysts)
HealthSys-Clinical-Analytics-Viewers       → Viewer (clinicians, end users)
HealthSys-Finance-Contributors             → Contributor (FP&A team)
HealthSys-Compliance-Auditors              → Viewer (audit logs only)
```

#### Permission Matrix:

| Role | Create Workspace | Create/Edit Reports | Create Datasets | View Artifacts | Manage Permissions |
|------|---|---|---|---|---|
| **Admin** | ✓ | ✓ | ✓ | ✓ | ✓ |
| **Contributor** | ✗ | ✓ | ✓ | ✓ | ✗ |
| **Member** | ✗ | ✓ | ✗ | ✓ | ✗ |
| **Viewer** | ✗ | ✗ | ✗ | ✓ | ✗ |

### 4.4 Workspace Governance Policies

**Enforce via Admin Portal:**

1. **Disable Publish to Web** — All Fabric reports require authentication
2. **Restrict Workspace Creation** — Only `HealthSys-Fabric-Admins` can create
3. **Require Sensitivity Labels** — All reports marked `Internal`, `Confidential`, or `Restricted`
4. **Enable Service Principal Authentication** — For CI/CD, data ingestion pipelines
5. **Audit Trail Enabled** — Log all access, dataset refreshes, permission changes

### 4.5 Tenant-Level Settings

In **Fabric Admin Portal**, enforce:

```
Tenant Settings (apply to all workspaces):

✓ Users can create new workspaces: [Group: Fabric-Admins only]
✓ Users can create semantic models: [Disabled]  # Force use of centralized templates
✓ Publish to web: [Disabled]
✓ Export to PowerPoint: [Disabled]  # HIPAA consideration: prevent unsecured exports
✓ Export to PDF: [Enabled + Audit]
✓ Automatic page refresh: [Max 30 min]
✓ Allow service principal to use Fabric APIs: [Enabled + audit]
✓ Enable lineage: [Enabled]
✓ Allow dataflows to refresh: [Enabled + schedule approval]
```

---

## Part 5: Data Governance & Compliance Integration

### 5.1 Microsoft Purview Integration

**Use Purview to:**

1. **Track Lineage:** Bronze → Silver → Gold, show data flow
2. **Mark Sensitive Data:** Tag PII columns automatically
3. **Generate Data Catalog:** Enable data discovery by role
4. **Audit Access:** Who accessed what, when, why

#### Purview Scans:

- Schedule nightly scans of Silver/Gold layers
- Auto-classify PII (SSN, medical record #, patient name)
- Tag with HIPAA sensitivity labels
- Publish findings to business glossary

### 5.2 HIPAA Compliance Checklist

| Requirement | Fabric Implementation |
|---|---|
| **Encryption at Rest** | OneLake uses Microsoft-managed keys (default) |
| **Encryption in Transit** | TLS 1.2+ for all network traffic |
| **Access Audit Trail** | Fabric audit logs + Purview lineage |
| **Data Minimization** | Gold layer uses column-level masking + RLS |
| **Retention Policy** | Archive Bronze after 90 days (configurable per data type) |
| **User Authentication** | Azure AD MFA (enforce for all users) |
| **Role-Based Access** | RLS + workspace roles |
| **Data Breach Notification** | Alert on sensitive data access anomalies |

### 5.3 Data Quality Gates (Bronze → Silver)

Every table in Silver must pass:

```
Quality Rule | Pass Criteria
─────────────────────────────────
Completeness | >99% non-null in required fields
Accuracy     | Match source system counts (±0.1%)
Timeliness   | Refresh within SLA (e.g., within 2 hours of source)
Uniqueness   | PK unique count = expected unique entities
Conformity   | All PKs, FKs follow naming standard
Validity     | Date ranges valid, codes match lookup tables
```

Implement as a **Silver Validation Pipeline** (Data Factory) that flags failures and notifies data steward.

---

## Part 6: Implementation Roadmap

### Phase 1: Foundation (Weeks 1–4)

- [ ] Audit current data sources (EHR, lab, pharmacy, financial systems)
- [ ] Define medallion structure + folder layout
- [ ] Implement naming conventions in Bronze layer
- [ ] Set up audit logging + Purview scans
- [ ] Create first semantic model template (clinical domain)

### Phase 2: Integration (Weeks 5–8)

- [ ] Ingest 3–5 data domains into Bronze
- [ ] Build Silver transformations (dbt or Data Factory)
- [ ] Apply sensitivity labels + RLS rules
- [ ] Deploy 2–3 Gold analytics lakehouses
- [ ] Train first user group on templates

### Phase 3: Scale (Weeks 9–12)

- [ ] Roll out remaining data domains
- [ ] Publish 5–10 Power BI reports from templates
- [ ] Establish data stewardship council
- [ ] Implement automated quality gates
- [ ] Conduct HIPAA compliance audit

### Phase 4: Optimization (Weeks 13+)

- [ ] Monitor refresh times, query performance
- [ ] Gather user feedback, refine templates
- [ ] Archive cold Bronze data
- [ ] Establish SLAs for data refresh, support

---

## Part 7: Quick Reference Tables

### Naming Convention Quick Ref

```
Lakehouse:        prod_ehr_lh_bronze_raw
Pipeline:         prod_ehr_ppl_ingestion_delta
Semantic Model:   prod_clinical_outcomes_semantic
Report:           Clinical_Outcomes_Dashboard
Workspace:        prod_clinical_analytics_workspace
```

### Workspace Roles

```
Admin        → Full control, manage permissions
Contributor → Edit, create reports/datasets
Member       → Edit reports only
Viewer       → Read-only
```

### Medallion Layers

```
Bronze  → Raw, as-is from source
Silver  → Cleansed, conformed, integrated
Gold    → Analytics-ready, aggregated, masked
```

---

## Part 8: Governance Oversight

### Monthly Checklist

- [ ] Purview quality scan results reviewed
- [ ] Data lineage documentation updated
- [ ] RLS rules tested (spot-check 3 users)
- [ ] Workspace permissions audit (compare to AD groups)
- [ ] Slow-running queries identified + optimized
- [ ] Data steward sign-off on Silver schema changes

### Quarterly Review

- [ ] HIPAA compliance audit (access logs, sensitivity labels)
- [ ] Performance SLA review (refresh times, query latency)
- [ ] New data domain readiness assessment
- [ ] User feedback on template usability
- [ ] Cost optimization (capacity, compute, storage)

---

## Conclusion

This framework balances **compliance, scalability, and usability** for enterprise healthcare. The medallion architecture enforces lineage, naming conventions prevent sprawl, templates reduce shadow BI, and RBAC + RLS ensure HIPAA safety.

**Start with Phase 1, validate, then scale.** Adjust the specifics based on your current tooling and team.

---

## Appendices

### A. Sample DAX Measures (for templates)

```DAX
Total_Encounters = COUNTA(fact_encounters[encounter_id])

Avg_Length_of_Stay = AVERAGE(fact_encounters[los_days])

Readmission_Rate = 
    DIVIDE(
        COUNTROWS(FILTER(fact_encounters, fact_encounters[readmitted_flag] = 1)),
        COUNTROWS(fact_encounters),
        0
    )

Provider_Quality_Score = 
    CALCULATE(
        [Readmission_Rate],
        FILTER(dim_provider, dim_provider[specialty] = "Cardiology")
    )
```

### B. Sample RLS Rules

```DAX
-- Clinical Staff sees only their facility
[facility_id] = VALUE(USERNAME())

-- Department Lead sees own department
[dept_id] IN CALCULATETABLE(
    VALUES(ProviderMapping[dept_id]),
    ProviderMapping[user_email] = USERNAME()
)

-- Analyst role sees aggregate-only (no individual patient detail)
NOT([is_phi_detail])
```

### C. References

- Microsoft Fabric Best Practices: learn.microsoft.com/fabric
- HIPAA & Cloud Compliance: cms.gov, hitech.cms.gov
- Data Governance: Gartner CDO playbook
- Medallion Architecture: Databricks documentation
