Splitting up a big CSV file to multiple smaller CSV files using PowerShell

Sometimes you have the problem that you get large Excel or CSV lists to work with but you want to split them up by a certain criteria, e.g. company. Your goal may be to be able to create multiple Exchange Hybrid migration batches.

To guide you through this task, my example data looks like this:

Firstname;LastName;Country
Max;Mustermann;Germany
Maria;Musterfrau;Germany
Igor;Smirnow;Russia
John;Dow;USA
Felix;Frisch;Germany

As you can see, it doesn’t matter how the data is sorted.

To get started you must specify some parameters to control the behaviour how PowerShell will split the file:

$GroupField = "Country"
$Delimiter = ";"
$csv = "C:\tmp\MyBigCSVWithTonsOfData.csv"
$outfolder = "C:\tmp\CSV-Files\"

What do these parameters do? It’s simple:

  • $GroupField specified which CSV column will taken as identifier for the split process.
  • $Delimiter specified the delimiter you are using in your CSV.
  • $csv is the full path to the CSV file you want to split up in multiple smaller files.
  • $outfolder is the folderpath where the generated CSV files will be stored.

After the preparation you must read the CSV and prepare the data to be splitted:

$all = Import-Csv $csv -Delimiter $Delimiter
$groupedCollection = $all | Group-Object $GroupField

You see I use the command “Group-Object” which acts like the Excel function “Filter”. With this command you can group structured data and the result are multiple groups which contain the single entries. So if your CSV input file contained 5 entries, myou group by country and 3 of the people have “Germany”, 1 has “USA” and one has “Russia” the result of these 2 lines will be a return value of 3 groups:

  • One group with 3 entries – the people with country “Germany”
  • One group with 1 entry – the person with country “USA”
  • One group with 1 entry – the person with country “Russia”

PSGroup

Having a deeper look at the return value, you can see that the variable $groupcollection is an array. You can check this by executing this command:

$groupCollection.GetType()

Working with an array is easy, you probably know it from other scripts you wrote with PowerShell. Just create a foreach-loop and iterate through the items:

foreach($group in $groupedCollection)
{
   Write-Host ("Group '" + $group.Name + "' // " + $group.Count.ToString() + " Members")
   $group.Group | ConvertTo-Csv -NoTypeInformation -Delimiter "," | Out-File ($outfolder + $group.Name + ".csv")
}

And voila, the result is that you have 3 CSV files – one for each country:

PSGroup2

Now you can continue your work with the new smaller CSV files. As you may have noticed, I’ve hard-coded the output files with delimiter “,”.  Of course, if you need another delimiter, feel free to adjust it to your needs.

So after all that you can copy/paste the whole tiny script at once here:

$GroupField = "Company"
$Delimiter = ";"
$csv = "S:\CSV-Files\MyBigCSVWithTonsOfData.csv"
$outfolder = "S:\CSV-Files\"

$all = Import-Csv $csv -Delimiter $Delimiter
$groupedCollection = $all | Group-Object $GroupField

foreach($group in $groupedCollection)
{
   Write-Host ("Group '" + $group.Name + "' // " + $group.Count.ToString() + " Members")
   $group.Group | ConvertTo-Csv -NoTypeInformation -Delimiter "," | Out-File ($outfolder + $group.Name + ".csv")
}

Create multiple hybrid migration batches with PowerShell

Sometimes in Exchange Online migration projects you have the demand to make not few big batches but many small ones. In the end this means you have many CSV files which you want to use for bulk creation of migration batches.

You can either create them by using the GUI (which is not really fun when you have more than 10 CSV files) or by using this tiny PowerShell command. You have to replace the highlighted values with your own ones:

Get-ChildItem *.csv  | Foreach-Object{ New-MigrationBatch -Name ($_.Name -replace ".csv","") -TargetDeliveryDomain "TENANTNAME.mail.onmicrosoft.com" -AutoStart -AllowUnknownColumnsInCsv $true -NotificationEmails "" -CSVData ([System.IO.File]::ReadAllBytes( $_.FullName)) -BadItemLimit 99999 -LargeItemLimit 99999 -AllowIncrementalSyncs $true -SourceEndpoint "NAME OF YOUR HYBRID ENDPOINT"}

 

This command searches for all CSV files in the current folder and creates a migration batch for each CSV file with the following attributes:

  • The name of the batch will be the file name of the CSV file without the file extension
  • The batch will start automatically but has to be completed manually
  • The notification emails will be sent to the email you provide here
  • The CSV files may have any column, but the column “EmailAddress” must be present
  • The batches will perform incremental syncs
  • the LargeItemLimit and BadItemLimit are very high to ensure the mailboxes will not skip because of items that can’t be migrated
  • The hybrid endpoint of your organization will be taken for the move

Of course you may adjust this command depending on your needs, e.g. Auto Completion or a lower BadItemLimit.

In my case there was a limit of 100 migration batches. The Exchange Online Service Description doesn’t mention this limit, but be aware of the fact that this may hit you as well.

Happy migrating 🙂

Disable OWA attachment download

Some companies’ security policies recommend that it must be ensured that no company data will be saved on “non-company” devices. A first step to achieve that is to disable attachment download.  To do this, you can just remove the checkbox in ECP:

OWA file access

You can also create a new OWA policy and specify the following:

OWA file access2

If you are more the PowerShell Guy:

Get-OwaMailboxPolicy | Set-OwaMailboxPolicy -DirectFileAccessOnPublicComputersEnabled $false -DirectFileAccessOnPrivateComputersEnabled $false

The result is that the attachments cannot be downloded any more:

OWA file access3

The cool thing is that viewing attachments in Office Online is still possible.

There are more features which can be disabled to gain more security which will be discussed in separate articles.

Disconnect inactive RDP sessions

Many admins know the problem: they don’t have terminal servers for administration, so there are 2 RDP sessions to a server possible. And when connecting to  a server there are either no free sessions or many disconnected (not “logged-off”) sessions which are cosuming resources which is slowing down the server.

If you only want to see the sessions, you can execute only this:

query user | select -Skip 1 | ? {($_ -split "\s+")[-4] -eq 'Disc'} 

 

To disconnect all these sessions at once, you can execute this command inside an elevated PowerShell window:

query user | select -Skip 1 | ? {($_ -split "\s+")[-4] -eq 'Disc'} | % {logoff ($_ -split "\s+")[-5] /v}

 

 

And then all disconnected RDP sessions are forced logoff and the resources are free.

 

ADFS Proxy – An error occurred when attempting to establish a trust relationship with the federation service

This is a really weird and annoying error which can drive you crazy. But let’s start from the beginning. So what do we have?

  • An Office 365 tenant
  • An ADFS server in the internal network
  • An ADFS Proxy (a WAP) in the perimeter network
  • a wildcard certificate which was issued by a public CA

So up to not nothing special. The ADFS server configures well and is up & running. The firewall between the ADFS and the ADFS proxy was opened on port 443 so that these both can communicate with each other. So I’ve started the configuration of the WAP server, entering all the necessary data and then this error raised:

WAP Error GUI

Trying the configuration with PowerShell didn’t work better:

WAP Error Shell

So the first view was to the eventlog of the machines. What did I see?

  • On the ADFS proxy: No entries – neither in the Application not the ADFS eventlog. Yeah. *Happiness*
  • On the ADFS server: Event ID 364 with not helpful descriptions like this:
    • Encountered error during federation passive request. […] Contact your administrator for details – and a long stacktrace

Opening one of the ADFS websites from the ADFS proxy the following error raises:

ADFS TLS

It seems that the certificate is not presented well from the ADFS to the WAP and the error message in Internet Explorer is useless.

So.. what to do next? Start to google and analyze the traffic using fiddler. To shorten the story a little bit: In the fiddler logs I could see there is a problem with the certificate, but this may also be related to fiddlers SSL-decryption feature.

Googling around this error brings up a ton of tips and tricks what it could be, for example:

  • Certificate error: https://support.microsoft.com/en-us/help/3044974
  • About the Cipher Suites:
    • https://support.microsoft.com/en-us/help/3194197/considerations-for-disabling-and-replacing-tls-1.0-in-adfs
    • http://s4b-usergroup.com/office365-blog/adfs-3-0-tls-error/
    • really cool explanation: https://blogs.technet.microsoft.com/keithab/2015/06/22/error-while-configuring-wapthe-underlying-connection-was-closedpart-2/

But all this did not work.

 

So after 4 days of troubleshooting, re-installing and investigating I decided to begin from scratch and check each and every point again. And at this point the firewall guys told me: “Yes port 443 is open. Yes we have content inspection running”.

And here we go. Disabling content inspection solved the problem for this issue, now the ADFS and the WAP can communicate with each other. And the morale of this course: doublecheck with the firewall guys and the network security instruments. This can save you a lot of days for troublehsooting 🙂