Using PowerShell to Parse Files
One of the least exciting tasks you can do at work is manually parsing hundreds or thousands of files to search for those containing specific keyword phrases.
It’s not fun. You have to open them one by one, hit Ctrl+F, then verify if your keyword phrase exists. Tasks like this are not only time-consuming but absolutely dreadful.
I often perform these tasks to search for specific phrases in SQL Server database objects (like stored procedures, functions, and views). Lucky for me, my boss created a nifty script that uses SQL Server’s built-in information schema views. All I do is enter the keyword(s), execute, and wait for the results to populate.
But this only works for database objects. I cannot use this script on files saved to my hard drive.
Here is where PowerShell enters the conversation.
Windows’ scripting language has immense power you didn’t know existed. After a few minutes of reading documentation, or using ChatGPT like I do, you can generate fairly complex commands and scripts that automate your file parsing needs. Oh, it’s wonderful.
I build and maintain SSRS reports for my organization, and these reports are stored in .rdl
and .rds
files. These file types contain the SQL queries used to generate reports. Similar to what I do to find keywords in database objects, I do the same with these SSRS files. However, like I said, I cannot use my nifty SQL script to find the keywords.
So, instead, I resort to PowerShell.
There are two ways I use PowerShell in this scenario.
One is to search for keyword phrases in reports. This includes trying to find specific filters and/or SQL techniques used. The PowerShell script that works is this:
# Define the root directory for your SSRS Repo
$RootDirectory = "C:\Users\mmckee\ssrs_project"
# Define keyword(s) to search... For multiple Keywords, separate with pipe "|"
$Keywords = "LocationId|LocationName"
# Search .rdl and .rsd files
Get-ChildItem -Path $RootDirectory -Recurse -Include *.rdl, *.rsd |
Where-Object { $_.FullName -notmatch "\\bin\\" } |
Select-String -Pattern $Keywords |
ForEach-Object {
Write-Host "`nFile: $($_.Path)"
Write-Host "Line $($_.LineNumber): $($_.Line)"
}
This script parses all files within the root directory and subdirectories. If it detects a keyword in a file, it prints the file’s name, the line number where the keyword appears, and the line’s content.
The second way I use PowerShell is to search for reports that query data from specific tables and columns in the database. That one looks like this:
# Define directory with SSRS project files
$ssrsProjectPath = "C:\Users\mmckee\ssrs_project"
# Table columns we're searching for
$Columns = @("Business","BusinessType")
$Table = "Brand"
# Find all files that have .rdl or .rds extension (and are not in bin/ dir)
$ssrsFiles = Get-ChildItem -Path $ssrsProjectPath -Recurse |
Where-Object {($_.Extension -eq ".rdl" -or $_.Extension -eq ".rds") -and $_.FullName -notmatch "\\bin\\" } |
select-object -ExpandProperty FullName
# Initialize array to store reports containing Columns
$ReportsContainingColumns = @()
# Create HashSet to store unqiue columns and reports/SharedDatasets in reports
$UniqueColumns = New-Object System.Collections.Generic.HashSet[System.String]
$UniqueReportsAndDatasets = New-Object System.Collections.Generic.HashSet[System.String]
# Iterate through each file in $ssrsFiles
foreach ($file in $ssrsFiles) {
$content = get-content -path $file -raw
# Check if $Table exists in reports or dataset
if ($content -match "$Table") {
foreach ($col in $Columns) {
if ($content -match "$col") {
$ReportsContainingColumns += $file + " | Column = " + $col
# Add Column to HashSet if not already there
if (-not $UniqueColumns.Contains($col)) {
$UniqueColumns.Add($col) | Out-Null
}
}
}
if (-not $UniqueReportsAndDatasets.Contains($file)) {
$UniqueReportsAndDatasets.Add($file) | Out-Null
}
}
}
# Prints Reports Containing Table/Columns Specified
Write-Output "`nObjects Using Table --> $Table With Specified Columns`n"
$ReportsContainingColumns
Write-OutPut "`nUnique Reports and Shared Datasets Containing $Table`n"
$UniqueReportsAndDatasets
Write-OutPut "`nUnique Columns from $Table Used in Reports and Shared Datasets"
$UniqueColumns
This script is more complex, and I used ChatGPT to help me generate it. I don’t want to specify how each line here operates, but here’s an overview of what happens:
- We search for all files within the main project path’s subdirectories. We only care about
.rdl
and.rds
extensions. - We create an empty array that’ll later store reports containing the table/columns we specify in
$Columns
and$Table
. - We initialize two HashSets,
$UniqueColumns
and$UniqueReportsAndDatasets
. The former tracks values from$UniqueColumns
that appear in a file, while the latter tracks unique files plus the specified columns they reference. - Next, we loop through each file in
$ssrsFiles
and detect if the content contains our$Table
variable. If it does, we loop through all values in$UniqueColumns
to determine if the file references any of them. If a file’s content does not contain our$Table
variable, we proceed to the next item in the loop. - Finally, we print the following in the terminal:
- Unique files containing our specified table and at least one column.
- Unique files with each column it references.
- All the unique columns from
$Columns
that we found in at least one file.
The moral of my story here is that PowerShell is powerful and can help you parse files to search for keywords efficiently. We live in an age where we can make computers work for us and do tedious grunt work. So, there’s no need to perform repetitive manual tasks, like file parsing, when you can automate them with scripting languages.