Before you remediate a brownfield Azure environment, you need to know exactly what you’re working with. This post provides a systematic approach to discovering and documenting your current Azure state, producing output that’s both human-readable and structured for LLM-driven remediation.
Prerequisites#
Setup#
# Ensure Azure CLI is current
az upgrade
# Install Resource Graph extension
az extension add --name resource-graph
# Clone Microsoft's review checklists
git clone https://github.com/Azure/review-checklists.gitCreate Service Principal for Discovery#
Create a dedicated service principal with exactly the permissions needed for discovery. This approach ensures reproducibility and makes permission requirements explicit. It also allows you to remove the service principal when the work is done to maintain zero-trust principles.
# Define your variables
MG_ID="contoso-root" # Your root management group ID
SP_NAME="sp-alz-discovery" # Service principal name
# Create service principal and capture output
SP_OUTPUT=$(az ad sp create-for-rbac --name "$SP_NAME" --skip-assignment -o json)
# Parse credentials from JSON output
SP_APP_ID=$(echo "$SP_OUTPUT" | jq -r '.appId')
SP_PASSWORD=$(echo "$SP_OUTPUT" | jq -r '.password')
TENANT_ID=$(echo "$SP_OUTPUT" | jq -r '.tenant')
# Assign Reader role at management group level
az role assignment create \
--assignee "$SP_APP_ID" \
--role "Reader" \
--scope "/providers/Microsoft.Management/managementGroups/$MG_ID"
# Login as the service principal
az login --service-principal \
--username "$SP_APP_ID" \
--password "$SP_PASSWORD" \
--tenant "$TENANT_ID"
# Verify access
az account show --query "{Name:name, Id:id, TenantId:tenantId}"Note: Store the service principal credentials securely. You’ll delete the SP after discovery is complete.
Phase 1: Environment Inventory#
Start by mapping your management group hierarchy and resource distribution.
Management Group Structure#
# Define your variables
MG_ID="contoso-root" # Your root management group ID
# Full hierarchy tree
az account management-group list --query "[].{Name:displayName, Id:name, Parent:details.parent.displayName}" -o table
# Subscriptions per management group
az account management-group show --name "$MG_ID" --expand children --recurse \
--query "children[?type=='Microsoft.Management/managementGroups/subscriptions'].{Sub:displayName, Id:name}"Subscription Inventory#
# List all subscriptions with state
az account list --query "[].{Name:name, Id:id, State:state}" -o table
# Resource count per subscription
az graph query -q "
summarize count() by subscriptionId
| project subscriptionId, resourceCount=count_
" --management-groups "$MG_ID" -o tableResource Distribution#
# Resources by type across all subscriptions
az graph query -q "
summarize count() by type
| order by count_ desc
| project type, count_
" --management-groups "$MG_ID" -o json > resource_inventory.json
# Resources by location
az graph query -q "
summarize count() by location
| order by count_ desc
" --management-groups "$MG_ID" -o tablePhase 2: Automated Assessment with Microsoft Checklists#
Microsoft’s review-checklists repository contains Azure Resource Graph queries mapped to Landing Zone best practices.
ALZ Checklist Overview#
The alz_checklist.en.json covers:
- Identity and access management
- Network topology and connectivity
- Security governance
- Management and monitoring
- Platform automation
- Subscription organization
Running the Assessment#
# Define your variables
MG_ID="contoso-root" # Your root management group ID
cd review-checklists/scripts
# Run against your management group hierarchy
./checklist_graph.sh \
-m "$MG_ID" \
-c ../checklists/alz_checklist.en.json \
-o json > alz_assessment.json
# For text output (human review)
./checklist_graph.sh \
-m "$MG_ID" \
-c ../checklists/alz_checklist.en.json \
-o text > alz_assessment.txtThe JSON output contains compliance status for each checklist item with supporting evidence from your environment.
Phase 3: Targeted Discovery Queries#
Supplement the automated assessment with these targeted queries for each Landing Zone pillar.
Governance#
# Define your variables
MG_ID="contoso-root" # Your root management group ID
# Policy assignments across management groups
az graph query -q "
policyresources
| where type == 'microsoft.authorization/policyassignments'
| project name, properties.displayName, properties.scope, properties.policyDefinitionId
" --management-groups "$MG_ID" -o json > governance_policies.json
# Custom policy definitions
az graph query -q "
policyresources
| where type == 'microsoft.authorization/policydefinitions'
| where properties.policyType == 'Custom'
| project name, properties.displayName, properties.policyRule
" --management-groups "$MG_ID" -o json > governance_custom_policies.json
# Role assignments (custom roles focus)
az graph query -q "
authorizationresources
| where type == 'microsoft.authorization/roleassignments'
| project principalId, roleDefinitionId, scope
" --management-groups "$MG_ID" -o json > governance_rbac.jsonNetworking#
# Define your variables
MG_ID="contoso-root" # Your root management group ID
# Virtual networks and peerings
az graph query -q "
resources
| where type == 'microsoft.network/virtualnetworks'
| project name, resourceGroup, subscriptionId,
addressSpace=properties.addressSpace.addressPrefixes,
peerings=properties.virtualNetworkPeerings
" --management-groups "$MG_ID" -o json > network_vnets.json
# NSG rules analysis
az graph query -q "
resources
| where type == 'microsoft.network/networksecuritygroups'
| project name, resourceGroup, subscriptionId,
rules=properties.securityRules
" --management-groups "$MG_ID" -o json > network_nsgs.json
# Private endpoints
az graph query -q "
resources
| where type == 'microsoft.network/privateendpoints'
| project name, resourceGroup, subscriptionId,
targetResource=properties.privateLinkServiceConnections[0].properties.privateLinkServiceId
" --management-groups "$MG_ID" -o json > network_private_endpoints.json
# Public IPs
az graph query -q "
resources
| where type == 'microsoft.network/publicipaddresses'
| project name, resourceGroup, subscriptionId,
ipAddress=properties.ipAddress,
allocation=properties.publicIPAllocationMethod
" --management-groups "$MG_ID" -o json > network_public_ips.jsonIdentity#
# Define your variables
MG_ID="contoso-root" # Your root management group ID
# Managed identities
az graph query -q "
resources
| where identity.type has 'SystemAssigned' or identity.type has 'UserAssigned'
| project name, type, resourceGroup, identityType=identity.type
" --management-groups "$MG_ID" -o json > identity_managed.json
# Key Vault access policies
az graph query -q "
resources
| where type == 'microsoft.keyvault/vaults'
| project name, resourceGroup, subscriptionId,
accessPolicies=properties.accessPolicies,
enableRbac=properties.enableRbacAuthorization
" --management-groups "$MG_ID" -o json > identity_keyvault_access.jsonSecurity#
# Define your variables
MG_ID="contoso-root" # Your root management group ID
# Defender for Cloud coverage
az graph query -q "
securityresources
| where type == 'microsoft.security/pricings'
| project name, pricingTier=properties.pricingTier
" --management-groups "$MG_ID" -o json > security_defender.json
# Storage account security settings
az graph query -q "
resources
| where type == 'microsoft.storage/storageaccounts'
| project name, resourceGroup, subscriptionId,
httpsOnly=properties.supportsHttpsTrafficOnly,
minTls=properties.minimumTlsVersion,
publicAccess=properties.allowBlobPublicAccess
" --management-groups "$MG_ID" -o json > security_storage.json
# SQL Server auditing and encryption
az graph query -q "
resources
| where type == 'microsoft.sql/servers'
| project name, resourceGroup, subscriptionId,
adminLogin=properties.administratorLogin,
minTls=properties.minimalTlsVersion
" --management-groups "$MG_ID" -o json > security_sql.jsonOperations#
# Define your variables
MG_ID="contoso-root" # Your root management group ID
# Diagnostic settings coverage
az graph query -q "
resources
| extend hasdiag = isnotnull(properties.diagnosticSettings)
| summarize count() by type, hasdiag
| order by count_ desc
" --management-groups "$MG_ID" -o json > ops_diagnostics.json
# Tag compliance
az graph query -q "
resources
| where tags == '' or isnull(tags)
| summarize count() by type
| order by count_ desc
" --management-groups "$MG_ID" -o json > ops_missing_tags.json
# Resource naming patterns
az graph query -q "
resources
| project name, type
| extend prefix = substring(name, 0, 4)
| summarize count() by prefix, type
| order by count_ desc
" --management-groups "$MG_ID" -o json > ops_naming.jsonStructuring Output for LLM Execution#
Consolidate your findings into a structured format that an LLM can parse and act upon.
Recommended Output Format#
# Define your variables
MG_ID="contoso-root" # Your root management group ID
# Create consolidated discovery output
cat > discovery_output.json << EOF
{
"assessment_date": "$(date -I)",
"management_group_root": "$MG_ID",
"summary": {
"total_subscriptions": 0,
"total_resources": 0,
"alz_checklist_compliance": "see alz_assessment.json"
},
"findings_by_pillar": {
"governance": {
"files": ["governance_policies.json", "governance_custom_policies.json", "governance_rbac.json"],
"key_gaps": []
},
"networking": {
"files": ["network_vnets.json", "network_nsgs.json", "network_private_endpoints.json", "network_public_ips.json"],
"key_gaps": []
},
"identity": {
"files": ["identity_managed.json", "identity_keyvault_access.json"],
"key_gaps": []
},
"security": {
"files": ["security_defender.json", "security_storage.json", "security_sql.json"],
"key_gaps": []
},
"operations": {
"files": ["ops_diagnostics.json", "ops_missing_tags.json", "ops_naming.json"],
"key_gaps": []
}
}
}
EOFIncluding Context with Findings#
When passing to an LLM for remediation planning, include:
- Current state - The JSON output files from discovery
- Target state - Azure Landing Zone best practices (reference CAF documentation)
- Constraints - Business requirements, timeline, change windows
- Scope - Which pillars to prioritize
Example prompt structure for LLM:
## Context
- Discovery files attached: [list files]
- Target: Azure Landing Zone best practices per CAF
- Constraints: Production environment, changes require change advisory board approval
## Request
Analyze the discovery output and create a prioritized remediation plan for [pillar].
Group actions by:
1. Quick wins (low risk, high impact)
2. Foundational changes (required for other improvements)
3. Optimization (nice to have)
For each action, provide:
- Azure CLI or PowerShell command
- Risk assessment
- DependenciesCleanup#
After discovery is complete, delete the service principal to maintain security hygiene.
# Define your variables
SP_NAME="sp-alz-discovery" # Service principal name used during setup
# Logout from service principal session
az logout
# Login with your regular account (with permissions to manage service principals)
az login
# Delete the service principal
az ad sp delete --id $(az ad sp list --display-name "$SP_NAME" --query "[0].appId" -o tsv)
# Verify deletion
az ad sp list --display-name "$SP_NAME" --query "[].displayName"Next Steps#
With discovery complete, you now have a comprehensive baseline of your Azure environment mapped against Landing Zone best practices. This output serves as the input for your remediation phase, where you’ll systematically address gaps to achieve a well-architected landing zone.
Resources
