In one of the DevOps automation testing environments I work with I recently came across an AD OU that had over 75,000 unused computer records. This environment is used for repeated testing of entire automation stacks with unique computer names, so it is normal that these records would pile up. While this particular OU was for Linux machines - the problem obviously affected both Windows and Linux across all OUs.
Being that it is the year 2018 I thought finding a ready-made solution on the web would be child’s play (oh Murphy - why do you plant those thoughts in my head!).
As it is with the Toolsmithing nature, when I could not find a simple solution, I had a strong desire to conjure a tool for everyone.
The Existing Solutions
Due to the years long need to purge inactive computer domain join records, there are a lot of existing solutions. Many of these solutions are part of a software product that requires installation or part of commercial PowerShell modules. Others are pages long scripts that take time to review to ensure they get only what I wish them to But a long standing commitment to favor using only what ships on the box drove me to find something that did not require third party or full-on software installations - and if possible not even PowerShell module installs (as they are not easy to accomplish on all versions of PowerShell).
The Solution
Here are the attributes and benefits of the solution (I think both you and I appreciate bullets rather than paragraphs eh?):
- It only requires PowerShell 3 and the ActiveDirectory module which is installed on any Domain Controller.
- It successfully runs under SYSTEM account in the task scheduler (eliminating any dependency on admin credentials encoded into the scheduled task)
- You can configure the purge threshold (how many days since the machined logged onto the domain)
- It has a report only mode so that you can audit exactly what will happen when running under the task scheduler as the SYSTEM account (Read “GOTCHA!” below.)
- It generates a report file of each run so you can audit what it has done (but only if it has work to do)
- Writes log to the $env:public folder to avoid the need for extra code to create a folder and to avoid losing reports if temp is cleaned.
Anatomy Of The Code
This can be scheduled as SYSTEM on a domain controller - creates a CSV of whatever was attempted to be removed. AFTER RUNNING A TEST REPORT SCHEDULED UNDER SYSTEM ACCOUNT, Change $PurgeThreshold
to less than 10 years and set $ReallyDelete
to $True
. It is very, very important that the days are negative - otherwise all computer records will be targeted.
$ReallyDelete = $False ; $PurgeThreshold = -3650 ; $RemoveList = @(Search-ADAccount -AccountInactive -DateTime (get-date).AddDays($PurgeThreshold) -ComputersOnly) ; If ($RemoveList.count -lt 1) {Exit 0} ; $RemoveList | Sort-Object LastLogonDate | Select-Object Name, LastLogonDate, DistinguishedName, SID, ObjectGUID | Export-Csv -NoTypeInformation -Path "$env:public\AD-ComputerCleanUp-At-$(Get-date -format 'yyyyMMddHHmm').csv" ; If ($ReallyDelete) {$RemoveList | Remove-ADComputer -Confirm:$False}
Note: It is best to get the code from the repository link at the bottom as copying from a browser is a notorious way to introduce strange artifacts into your code.
Yes it is a very long oneliner and the semi-colons technically aren’t what is meant by oneliner - but I wanted a single line because it allows setting it up in the task scheduler without having to transit a separate file. If you wish, you can store it as a file and tidy it up for human readability. It can also be run at the command line to test the report.
The code uses the Search-ADAccount CMDLet which has the ability to find inactive computer accounts. If there are no computers to purge, the code exits. If there are computers to process the code will generate a CSV report of what it is about to attempt deleting. Finally, if $ReallyDelete = $True
, it will actually delete the computer records.
How To Use The Code
Set
$ReallyDelete = $False
, set$PurgeThreshold
to a value that is likely to only generate a small number of target machines.Run the code in an interactive prompt and review the generated report file to ensure the code is targeting exactly what was intended. Adjust your threshold to target less than 100 machines.
With the same settings…
- on ONLY ONE domain controller,
- create a scheduled task to run as SYSTEM with no password
- set the schedule to be repeating - but
- set the first execution to be days in the future.
Note: Do not delete the same computer records on multiple DCs. Using SYSTEM prevents the bad security practice of embedding credentials. If you do not run on a DC, you will need domain admin credentials defined in the scheduled task.
Right click and force run the task to run.
Review the generated report file to ensure the code is targeting exactly what was intended (the same small set of machines).
Update the task to set
$ReallyDelete = $True
and re-force run the task.Examine the report AND Active Directory to verify results.
Update the task with the long term value for
$PurgeThreshold
Force run the task again so that the first big purge happens while you are able to audit. IMPORTANT: If you have a lot of records to initially delete, it may be wise to do them in smaller batches with time for AD to sync in between each one. You can play with reporting mode on a command line to figure out what dates create what batch sizes and then run several batches over a period of days so as not to push too many deletes all at once.
Special Case: Multiple Scheduled Instances
If you want to run this on multiple DCs for resilience, ensure that the jobs have enough elapsed time between execution so that AD replication will occur from the first scheduled task before the next runs. It would be best if this elapsed time is in days - not hours - especially for environments with many DCs. When calculating elapsed time, don’t forget to account for timezone differences of DCs in different timezones.
What Threshold Should Be Used To Remove Machines?
Domain joined Windows machines have a user name (computer name with “$” on the end) and regularly logon to the domain to keep their domain trust relationship active. By default Windows changes this password every 30 days. If the machine is not logging in regularly, this password change results in a broken trust. To reestablish domain membership the machine must be remove and re-added to the domain. This is why it is considered safe to forcibly remove a machine from the domain if it has not logged in 30 days.
However! If your company has extended this value or disabled it on some computers due to infrequent domain connectivity (e.g. VM templates/snapshots, very remote workers) - then 30 days might be too soon for you. It’s important to know if this value is being managed in one of these extended ways to be able to determine the appropriate window. Keep in mind if this is running on a schedule, there is not need to get uptight about getting these machines out as soon as possible - as long as those that exceed the threshold are being automatically purged - it shouldn’t matter how long that threshold is because it does not require human attention.
GOTCHA!
I ran into a serious gotcha with Search-ADAccount that I can’t explain - but that nearly sunk me. When first prototyping the command I used the -TimeSpan parameter with the set number of days I wanted to purge. When I ran it from an interactive command line to report on the machines it would delete I got 31. When I ran the delete in the same context it removed only 31 machines. I then modified the days parameter to get about 50 machines and ran it under the SYSTEM account on a domain controller. For some reason it listed every machine in the domain from this context. I have seen many odd behaviors when running code as SYSTEM and/or under the scheduler - but this one could have been disastrous.
I refactored the code to use the -DateTime parameter and it worked identically in an interactive command prompt and under the scheduler using SYSTEM account.
This reinforced several disciplines I try to keep aware of:
- Always test the EXACT execution context of your code before turning it on.
- Do super-paranoid testing when that code does mass deletes of any type.
- Never assume that the task scheduler nor SYSTEM account context are going to act the same as interactive testing under an admin account.
- From the FIRST LINE OF CODE - create an audit or whatif mode for any code that can have devastating operational consequences - it helps you debug the code as well as being a best practice for daily operations of the code. This idea is, in some ways, akin to Test Driven Development (TDD).
Possible Modifications
This code can be easily modified altered to:
- disable inactive records for a period of time before deletion (although it may be just as easy to create a longer threshold value like 1 year).
- purge unused user accounts - with user accounts there is not an automatic period of dormancy that would indicate they are unused - so be careful here.