Find invalid characters using PowerShell

Have you ever copied code from a web site or an email and have it not work because some of the characters auto-corrected to weird quotes or dashes or the like? I have, and it can be a real hassle finding those characters because they look so much like the normal characters.

I wrote this module to find those characters.

From the module description:

Finds out of range encoding (ASCII, ANSI, Unicode) of characters in a file or object. The default “in range” is 32 to 126, most of the “printable” ASCII.
Each line is printed to the console, preceded by a line number. In range characters are displayed in black and white. Out of range characters are displayed in yellow with a red background. At the end of the output is a log of each out of range character; listing line number, character number, the character as it is displayed, and the encoding value.
A custom range can be set by passing an array of in-range INTs. The range does not need to be consecutive, it just needs to be an array of integers.
This module will be especially useful in checking code pasted from email or the web with might have unacceptable characters.
Note that out of range characters are not always wrong, but in the wrong spot can cause problems.

<#
.Synopsis
   Finds out of range encoding of characters in a file or object 
.DESCRIPTION
   Finds out of range encoding (ASCII, ANSI, Unicode) of characters in a file or object. The default "in range" is 32 to 126, most of the "printable" ASCII.
   Each line is printed to the console, preceded by a line number. In range characters are displayed in black and white. Out of range characters are displayed in yellow with a red background. At the end of the output is a log of each out of range character; listing line number, character number, the character as it is displayed, and the encoding value.
   A custom range can be set by passing an array of in-range INTs. The range does not need to be consecutive, it just needs to be an array of integers.
   This module will be especially useful in checking code pasted from email or the web with might have unacceptable characters. 
   Note that out of range characters are not always wrong, but in the wrong spot can cause problems.
.EXAMPLE
   Find-OutOfRangeCharacters -Filename .\fate.ps1
.EXAMPLE
   Find-OutOfRangeCharacters -Object $(Get-Content -Path .\fate.ps1) -Range $(32..96)
.EXAMPLE
   Find-OutOfRangeCharacters -Object "hello wörld"
#>
function Find-OutOfRangeCharacters
{
    [CmdletBinding()]
    Param
    (
        # File of data to be searched
        [Parameter(ParameterSetName='Filename')]
        [ValidateScript({Test-Path $_})]
        [string]
        $Filename,

        # Data to be searched
        [Parameter(ParameterSetName='Object')]
        [string[]]
        $Object,

        # INT array of in-range encoding values
        [int[]]
        $Range = 32 .. 126
    )

#region retrieve data
    if ($Object) 
    {
        $c = $Object
    }
    else
    {
        $c = Get-Content $Filename #-ErrorAction Stop
    }
#endregion

    $message = ""

    # loop through each line
    for ($i = 0; $i -lt $c.Length; $i++)
    { 
        # need to display "$i + 1" because arrays count from 0 but humans count from 1
        $PrettyLineNumber = "[" + $($i + 1).ToString("000") + "]" 
        Write-Host -Object $PrettyLineNumber  -NoNewline -ForegroundColor White -BackgroundColor Blue
        $line = $c[$i]

        # loop through each character in the line
        for ($k = 0; $k -lt $line.Length; $k++)
        {
            # get encoding value for the character
            $CharValue = [int][char]$line[$k]

            if ($CharValue -notin $Range) 
            {
                # need to display "$i + 1" and "$k + 1" because arrays count from 0 but humans count from 1
                Write-Host -Object $line[$k] -NoNewline -ForegroundColor Yellow -BackgroundColor Red
                $message += "Line # $($i + 1), character # $($k + 1), displays as `"$($line[$k])`", encoding value $CharValue `n" 
            }
            else
            {
                Write-Host -Object $line[$k] -NoNewline -ForegroundColor White -BackgroundColor Black 
            }
        }
        # write end of line
        Write-Host

    }
    # write log of out of range characters found
    Write-Output $message
}

Leave a Reply

Your email address will not be published. Required fields are marked *