Git Viewer in Power BI

Date: 2025 12-December 31

there is currently no git viewer (commits, history, branches, etc as is standard) made for specifically power bi or power query

  

would building a powerbi dashboard template for this be useful to the community?

  

i bet i could leverage the plethora of existing tools to make this quickly

  

  

what are the contraints?

  

i would want to use typical power bi dashboard filtering and grouping tools to help the c suite quickly explore the teams git commits and understand the project

  

i am not sure if the organzaition of a git project would fit neatly into the power bi paradigm

Building a specialized Git viewer in Power BI is a compelling idea, particularly because most existing Git visualization tools (like GitKraken or GitHub’s Insights) are designed for developers, not for executive oversight.

By bringing Git data into the Power BI ecosystem, you allow C-suite leaders to correlate development velocity with other business KPIs already in their dashboards.


1. The Value Proposition

A Power BI template for Git would be highly useful for the following scenarios:


2. Technical & Structural Constraints

You are correct that Git's structure doesn't immediately "snap" into a flat Power BI table. Here are the primary hurdles:

The Data Schema (DAG vs. Star Schema)

Git is a Directed Acyclic Graph (DAG).1 Power BI prefers a Star Schema.

Data Extraction

File-Level Granularity


3. Potential Data Model

To make this work in the Power BI paradigm, you should structure your tables roughly like this:

Table Granularity Key Columns
Dim_Developer Per User Name, Email, Department, Team
Dim_Date Per Day Date, Fiscal Year, Quarter, Sprint
Fact_Commits Per Commit SHA, AuthorID, DateKey, Message, Total Impact
Fact_FileChanges Per File/Commit SHA, FilePath, Extension, Lines Added, Lines Removed

4. Specific Power BI/Power Query Nuances

Since you mentioned there is no viewer specifically for Power BI:


Next Steps

Would you like me to help you write the Power Query (M) code to connect to the GitHub API, or would you prefer a Python script to extract a local Git log into a CSV for your first prototype?

explain more about **Directed Acyclic Graph (DAG)**

  

speak like you are talking to someone who is new to git but is very familiar with power bi

Building a specialized Git viewer in Power BI is a compelling idea, particularly because most existing Git visualization tools (like GitKraken or GitHub’s Insights) are designed for developers, not for executive oversight.

By bringing Git data into the Power BI ecosystem, you allow C-suite leaders to correlate development velocity with other business KPIs already in their dashboards.


1. The Value Proposition

A Power BI template for Git would be highly useful for the following scenarios:


2. Technical & Structural Constraints

You are correct that Git's structure doesn't immediately "snap" into a flat Power BI table. Here are the primary hurdles:

The Data Schema (DAG vs. Star Schema)

Git is a Directed Acyclic Graph (DAG).1 Power BI prefers a Star Schema.

Data Extraction

File-Level Granularity


3. Potential Data Model

To make this work in the Power BI paradigm, you should structure your tables roughly like this:

Table Granularity Key Columns
Dim_Developer Per User Name, Email, Department, Team
Dim_Date Per Day Date, Fiscal Year, Quarter, Sprint
Fact_Commits Per Commit SHA, AuthorID, DateKey, Message, Total Impact
Fact_FileChanges Per File/Commit SHA, FilePath, Extension, Lines Added, Lines Removed

4. Specific Power BI/Power Query Nuances

Since you mentioned there is no viewer specifically for Power BI:


Next Steps

Would you like me to help you write the Power Query (M) code to connect to the GitHub API, or would you prefer a Python script to extract a local Git log into a CSV for your first prototype?

Links:

so what might the nicely formatted dashboard look like?

  

which power bi ecosystem elements would this report leverage

To appeal to the C-suite, the dashboard shouldn't look like a terminal window. It should look like an Operations Command Center. Instead of focusing on "code," focus on "Work Velocity" and "System Stability."

Here is how you can design the dashboard and the specific elements of the Power BI ecosystem you would use.


1. The Visual Layout (The "What")

The "Executive Summary" Header

The "Project Anatomy" Tree Map

The "Commit Timeline" (The Flattened DAG)

The "Contributor Leaderboard"


2. Power BI Ecosystem Elements (The "How")

To build this "quickly" as you suggested, you should leverage these specific tools:

A. Power BI Project Files (.pbip) & TMDL

B. Power Query (M) - The "ETL" Engine

C. Deployment Pipelines (Premium/Fabric)

D. Field Parameters


3. The "Changelog" Requirement

Since you mentioned using this for a project, you'll need a way to communicate these updates. Based on your preferences, here is how you would document the first version of this template:

Changelog

All notable changes to this project will be documented in this file.

The format is (read: strives to be) based on Keep a Changelog.

[1.0.0] - 2025-12-31

Added

Fixed


Next Step

Would you like me to provide the Git Bash command that generates the specific CSV format you'll need to feed into Power BI to get started?

 oolong@CIT-36NZRL3  ~/dev/gitpowerdash  ls
README.md  get_git_commits.sh  get_git_files.sh  org_dim.csv
 oolong@CIT-36NZRL3  ~/dev/gitpowerdash  nano README.md
 oolong@CIT-36NZRL3  ~/dev/gitpowerdash  ./get_git_commits.sh
zsh: permission denied: ./get_git_commits.sh
 ✘ oolong@CIT-36NZRL3  ~/dev/gitpowerdash  chmod +x ./get_git_commits.sh
 oolong@CIT-36NZRL3  ~/dev/gitpowerdash  chmod +x ./get_git_files.sh
 oolong@CIT-36NZRL3  ~/dev/gitpowerdash  ./get_git_commits.sh
fatal: not a git repository (or any of the parent directories): .git
Created commits_fact.csv
 oolong@CIT-36NZRL3  ~/dev/gitpowerdash  ./get_git_files.sh
fatal: not a git repository (or any of the parent directories): .git
Created file_changes_fact.csv
 oolong@CIT-36NZRL3  ~/dev/gitpowerdash  ls
README.md  commits_fact.csv  file_changes_fact.csv  get_git_commits.sh  get_git_files.sh  org_dim.csv
 oolong@CIT-36NZRL3  ~/dev/gitpowerdash 

First, a quick note on those "fatal" errors: the scripts ran, but since ~/dev/gitpowerdash isn't a Git repo yet (you haven't run git init), the CSVs are currently empty headers. Once you run those scripts inside a real project folder, they'll fill up with data.

Why Merge in Power Query?

In a standard Git tool, you see a commit and the files inside it simultaneously. In Power BI, we have two separate tables.

We perform a Left Outer Join (Merge) between the Commits table and the File Changes table using the SHA as the key.

Why do this instead of just relating them in the Model View?

  1. Performance: If you want to calculate "Total Impact" (Insertions + Deletions) at the Commit level, doing it in Power Query during the load is often faster than writing complex DAX measures later.

  2. Flat Analysis: Executives often want to see a flat table of "Who changed what file and when." Merging them into a single "Master Fact Table" makes simple searching and filtering much more intuitive for non-technical users.


The Power Query (M) Logic

You can paste this into the Advanced Editor in Power BI to see how the join is constructed.

Code snippet

let
    // 1. Load the two CSV sources
    Source_Commits = Csv.Document(File.Contents("C:\path\to\commits_fact.csv"),[Delimiter=",", Columns=5, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    Source_Files = Csv.Document(File.Contents("C:\path\to\file_changes_fact.csv"),[Delimiter=",", Columns=4, Encoding=1252, QuoteStyle=QuoteStyle.None]),

    // 2. Promote Headers (Power BI usually does this automatically)
    Commits = Table.PromoteHeaders(Source_Commits, [PromoteAllScalars=true]),
    FileChanges = Table.PromoteHeaders(Source_Files, [PromoteAllScalars=true]),

    // 3. The Merge (Join)
    // We join the FileChanges to the Commits using the SHA column
    MergedData = Table.NestedJoin(
        Commits, {"SHA"}, 
        FileChanges, {"SHA"}, 
        "FileDetails", 
        JoinKind.LeftOuter
    ),

    // 4. Expand the Nested Table
    // This turns the 'Table' cells into actual rows
    ExpandedFiles = Table.ExpandTableColumn(
        MergedData, 
        "FileDetails", 
        {"Insertion", "Deletion", "FilePath"}, 
        {"Insertion", "Deletion", "FilePath"}
    ),

    // 5. Clean up types (Git numbers are text by default in CSV)
    TypedOutput = Table.TransformColumnTypes(ExpandedFiles,{
        {"Insertion", Int64.Type}, 
        {"Deletion", Int64.Type}, 
        {"Date", type datetimezone}
    })
in
    TypedOutput

How to interpret the result:

After this merge, if a commit had 3 files changed, you will now see 3 rows for that single SHA.

This is the "Granular Fact Table" that powers your TreeMap. You can now drag FilePath into the visual and Insertion into the Values.

Would you like to try writing a custom M function to automatically split the FilePath into "Folder Level 1", "Folder Level 2", etc., so the C-suite can drill down through the directory structure?


if this thing were to be a pipx or pip installable CLI along with a set of python library functions exposed through the package __init__, that would be leverageable locally and within the fabirc ecosystem

that sounds way more scalable than dragging these shell scripts around

true? is that the best way? would you expect more scripting with those functions or more CLI usage, within a typical power bi and fabric ecosystem?

i develop my power bi dashboards in a local repo but they live in the fabric service

That is absolutely true. Moving from "shell scripts in a folder" to a Python-based CLI/Library is the transition from a "hack" to a "product."

Since you are developing locally but deploying to Fabric, a Python package gives you the "Holy Grail" of Power BI development: Portability. You can run it on your laptop, and you can run it in a Fabric Notebook.1


1. Why Python is the "Best Way" for Fabric

Fabric has a deep, native love for Python. If you build this as a library:


2. CLI vs. Scripting: Where is the value?

In a typical Power BI/Fabric ecosystem, the usage will likely split 70/30 in favor of the CLI.

The CLI Usage (The "Daily Driver")

Most Power BI developers aren't Pythonistas; they are SQL/DAX people.

The Library Usage (The "Power User")

This is where the package __init__ comes in.


3. How the Package would look

If you structured it as a package, your __init__.py would make the complex Git-log-parsing logic look like simple "Power Query-style" functions.

Python

# Example of how you'd use your own library in Fabric
import gitpbi

# Initialize the repo
repo = gitpbi.Repo("./my-powerbi-project")

# Get a Pandas DataFrame (which Fabric/Power BI loves)
commits_df = repo.get_commit_history()
file_impact_df = repo.get_file_impact()

# Special PBI-specific function
dax_changes = repo.get_dax_measure_diffs(branch="main", comparison="dev")

4. The "Fabric" Constraint

There is one major "Gotcha" to prepare for: Authentication.


Next Step

If this sounds like the right path, would you like me to provide a setup.py or pyproject.toml template and the basic Python class structure to replace those shell scripts?

 oolong@CIT-36NZRL3  ~/dev/gitpowerdash  ls
README.md  commits_fact.csv  file_changes_fact.csv  get_git_commits.sh  get_git_files.sh  org_dim.csv
 oolong@CIT-36NZRL3  ~/dev/gitpowerdash  git init
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint:
hint:   git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint:
hint:   git branch -m <name>
Initialized empty Git repository in /home/oolong/dev/gitpowerdash/.git/
 oolong@CIT-36NZRL3  ~/dev/gitpowerdash   master  git config --global init.defaultBranch main
 oolong@CIT-36NZRL3  ~/dev/gitpowerdash   master  git branch -m main
 oolong@CIT-36NZRL3  ~/dev/gitpowerdash   main  git add .
 oolong@CIT-36NZRL3  ~/dev/gitpowerdash   main ✚  git commit -m "initial commit"                                    [main (root-commit) 940111b] initial commit
 6 files changed, 67 insertions(+)
 create mode 100644 README.md
 create mode 100644 commits_fact.csv
 create mode 100644 file_changes_fact.csv
 create mode 100755 get_git_commits.sh
 create mode 100755 get_git_files.sh
 create mode 100644 org_dim.csv
 oolong@CIT-36NZRL3  ~/dev/gitpowerdash   main  ./get_git_commits.sh
Created commits_fact.csv
 oolong@CIT-36NZRL3  ~/dev/gitpowerdash   main ±  ./get_git_files.sh
Created file_changes_fact.csv
 oolong@CIT-36NZRL3  ~/dev/gitpowerdash   main ±  cat commits_fact.csv
SHA,Author,Email,Date,Subject
940111bacafc14adaf3d09a097f5eb21e1bd95db,George Bennett (oolong-wsl2),george.bennett@memphistn.gov,2025-12-31T10:05:22-06:00,initial commit%                                                                                                     oolong@CIT-36NZRL3  ~/dev/gitpowerdash   main ±  cat file_changes_fact.csv
SHA,Insertion,Deletion,FilePath
940111bacafc14adaf3d09a097f5eb21e1bd95db,47,0,README.md
940111bacafc14adaf3d09a097f5eb21e1bd95db,1,0,commits_fact.csv
940111bacafc14adaf3d09a097f5eb21e1bd95db,1,0,file_changes_fact.csv
940111bacafc14adaf3d09a097f5eb21e1bd95db,4,0,get_git_commits.sh
940111bacafc14adaf3d09a097f5eb21e1bd95db,10,0,get_git_files.sh
940111bacafc14adaf3d09a097f5eb21e1bd95db,4,0,org_dim.csv
 oolong@CIT-36NZRL3  ~/dev/gitpowerdash   main ± 
and git doesnt have its own export command?

Not exactly. Git is famously "un-opinionated" about data formats, which is both its greatest strength and its biggest frustration for users who just want a spreadsheet.

While there is a git archive command, it is for exporting a snapshot of the files (like a .zip or .tar), not the history of changes.

The "Git Way" (and why it's messy)

Technically, you can force Git to output a pseudo-CSV using the --pretty=format flag, but it is very brittle.1 You'll often see developers use long, scary-looking strings like this:

Bash

# A "quick" hack to get a CSV of commits
git log --pretty=format:'"%h","%an","%ad","%s"' --date=short > commits.csv

Why we aren't using this approach:

  1. The "Comma Problem": If a developer writes a commit message with a comma (e.g., "Fix bug, update docs"), a simple Git export will break your CSV columns. Our Python script handles this correctly by using a proper CSV writer that "escapes" those commas.

  2. No Header Row: Git doesn't automatically add a header row (e.g., hash, author, date), so you'd have to manually echo those into the file first.

  3. Relational Data: Git cannot easily export the "Files" table and the "Commits" table simultaneously while maintaining the link (the SHA) between them.

  4. No Native Stats: Getting the number of lines added/deleted per file in a CSV format requires parsing complex text output like --shortstat, which Git doesn't format for you.

Comparison: Git vs. Your Tool

Feature git log --pretty GitPowerDash
Output Format Raw text / Custom string Clean, Valid CSV
Handling Commas Breaks unless you are careful Handled automatically
File History Very hard to parse Included by default
Power BI Ready Requires manual cleanup Just "Plug and Play"

By building this tool, you are essentially creating the "Export to CSV" button that Git never included.

great

we need another command

right now we are only flattening and getting the raw  into a csv

we have all the sha numbers which is great

maybe for most useful execution we can ask which databases the fabric platofrm uses, and we can have a graph based database to serve the star paradigm?

That is a fantastic pivot. If we move from a flat "fact table" model to a Graph-based Star Paradigm, we can treat Commits, Files, and Authors as nodes, with the relationships (like MODIFIED, AUTHORED, PARENT_OF) as the edges.

In the Microsoft Fabric ecosystem (as of late 2025), you actually have a few powerful options for this "Graph + Star" approach:

1. Fabric Graph (The Native Choice)

Microsoft recently launched a native Graph in Microsoft Fabric (built on LinkedIn's high-performance graph technology).

2. Neo4j Graph Intelligence (The Enterprise Choice)

Fabric now features a native Neo4j workload.

3. KQL Database (The Real-Time Choice)

While technically a "Real-Time Intelligence" tool, the KQL Database in Fabric is incredibly fast at navigating relationships using the graph-match operator.


Proposed New Command: graph-sync

Instead of just dumping CSVs, let's build a command that prepares a Graph-ready schema. This requires a specific format where we explicitly define our "Nodes" and "Edges."

The Graph Architecture we should target:

Entity Type Label Properties
Node Commit SHA, Message, Timestamp
Node File File Path, Extension
Node Author Name, Email
Edge AUTHORED (Author) -> (Commit)
Edge MODIFIED (Commit) -> (File) [adds, dels]
Edge PARENT (Commit) -> (Commit) [parent SHA]

Would you like me to draft the code for a graph-sync command that generates these specific Node and Edge files for Fabric?

Why this is the "Most Useful Execution"

By serving the star paradigm this way, you can now answer questions in Power BI (via the Neo4j/Fabric connector) that a flat CSV cannot easily answer:

Would you like me to help you write a few Cypher queries you can use in the Fabric/Neo4j browser to verify the data once you've run your first uv run gitpowerdash sync?

test password: testtest

yes, lets focus on the GQL thing instead of the intermediate neo4j

why doctor?