Quick Tip- Azure SQL Server Connectivity

Symptoms

  1. An application server in Azure can’t connect to an IaaS SQL Server on Windows (also in Azure).
  2. The Connection Troubleshoot utility in the Azure Portal says network connectivity between the App server and SQL server on port 1433 is allowed:
    image
  3. PowerShell Test-NetConnection on the App server shows that communication with the SQL Server is blocked on port 1433
    image

Cause

Windows Firewall on the SQL Server is blocking communications from the App Server

Solution

Add a rule to the Windows Firewall on the SQL Server to allow SQL Traffic. See Microsoft Docs for details on how to do this.

PowerShell- Get Usernames from Windows Security Log

This snippet takes the export of the Windows Security log and returns a list of user ids from within it.

Exporting the Logs

  1. Open Event Viewer in Windows, select the Security Log and choose “Save All Events As….” – save the file as a Comma Delimited CSV.
    EventViewer
  2. Open the exported file in Notepad and add “,Description” to the end of the first line (PowerShell won’t import the description field otherwise)
    Notepad

PowerShell Manipulation

$events=Import-CSV securitylog.csv
$result= foreach ($event in $events) {
(((($event.Description) -Split "`r`n" |
Where-Object {$_ -like '*Account Name:*'}) -Split ":")[1]).trim() }
$result | Sort-Object –Unique

The result is a list of the Account Names found in the file. See GitHub for further info and updates.

RTO With Cohesity @ vRetreat

How Cohesity’s Approach to VM Backup Affects the Recovery Time Objective

This week I attended another vRetreat online, this time featuring data vendor Cohesity who I saw presenting at the (in-person) event last year. These are great events, and the small panel of delegates works well in the virtual format.

One thing that stood out to me in their presentation was the focus on the Recovery Time Objective (RTO)- in essence how long it takes to recover from an incident. In this post I will briefly discuss how I understand the definition of RTO before looking at how the Cohesity products work to keep this time down when working with Virtual Machines.

Recovery Time Objective

There’s plenty of material out on the interwebs which will explain RTO in great detail, but I’m taking the definition to be:

the expected length of time between an incident occurring and users being able to work normally again

As this diagram shows, the Time can be split into a number of notable sections, I’ve chosen the following three:

RTO

  1. Discovering the Incident. How long is it before we notice something is broken? Do we have to wait for a user to contact the service desk, or do we have responsive monitoring and alerting in place?
  2. Starting the Restore. How long does it take to actually start the restore operation? Is there a clear process to be followed? There might be internal decisions to be made as to whether to kick off a backup restore or attempt an in-place repair. Does somebody need to physically power on some equipment or find and load some tapes before a backup restore can commence?
  3. The Restore Operation. How long does it take between “Go” being pushed on the restore console and the service being usable again?

You’ll notice there’s also a fourth section on the diagram- the “Tidy Up”. This is all those processes that need to happen after the user is working again to get the system back into a normal state. This might include things like tidying up the original (broken) copies of the VM, returning a backup tape to the library, or investigation of the root cause. In any of these cases, I’ve put this step outside of the RTO as by the definition above, the Users are working normally again.

Ransomware Detection

imageRecovery from ransomware attacks seem to be the current favoured feature pushed by backup vendors, and Cohesity are no exception. Their take here is that because the Cohesity Data Platform handles all the backups, it sees all the data and this position in the data flow gives the rest of the Cohesity stack an opportunity to spot both when an unusual number of files have been changed and also when files suddenly can’t be indexed because they’ve been encrypted.

Tied with an alerting mechanism, this helps address our question in point 1 above- “Can we discover the incident quickly?”. The sooner someone in IT is aware that a ransomware infection has happened, the quicker a response can be started.

Additionally, Regular point-in-time snapshot backups make it easier to spot the time the infection started (or if not the point of infection, at least when the malware started acting) and the more granular the timestamps the less data is potentially lost between a backup and the incident. But we’re straying into RPO, not RTO, there.

Starting Restore

Most of the time when responding to a major incident and orchestrating a restore operation the user interface will be key to assessing the situation and bringing services back online. Cohesity offers a clean and tidy web-based UI, complete with the now-obligatory Dark Mode.

2020-07-09_21-58-27

Whilst the platform isn’t going to make those go/no-go decisions on kicking off a restore- it can influence that decision. Because the restores are so quick (as we’ll see shortly) the discussion on whether to repair or restore might favour the latter. It’s also possible to bring up the VMs in a network-disconnected state without touching the production systems so that once any discussions are complete the restore is even quicker (or if the repair option is chosen then that restore can just be cancelled)

Restoring User Service

Once recovery is started in Cohesity Data Protect an NFS datastore is created on the Data Platform- the VMDK is already here so there is no need to spend time at this point moving blocks across the network. The NFS datastore is mounted within vCenter and the VM registered and at this point the VM can be powered on and the users can get working again.

Once service has been restored, the longer process of putting the VM files back where they belong is achieved with the hypervisors own Storage vMotion technology (the fourth step above). Applications are available throughout this, and once the Cohesity datastore has been cleared, it is unmounted from vCenter.

As this slide extract from the Cohesity presentation shows, one of their big selling points is this quick recovery process. Notice how the “Recover data to target storage device” is positioned after the User access is restored.

image

Thanks to Patrick Redknap and the Cohesity team for hosting this informative event, and I look forward to the next one. For more information about Cohesity, check out their website: https://www.cohesity.com/

Please read my standard Declaration/Disclaimer and before rushing out to buy anything bear in mind that this article is based on a sales discussion at a sponsored event rather than a POC or production installation. I wasn’t paid to write this article or offered any payment, although Cohesity did sponsor a prize draw for delegates at the event.

AZ-104 Azure Administrator Associate

azure-administrator-associate BadgeLast week the Microsoft AZ104 exam went live and all those who took the test during the beta period (myself included) were issued with their results. The good news is I Passed!

This post will discuss the exam and some of the learning resources I used. Just in case your looking for a brain-dump, sorry- I’m not going to be giving out example questions I remembered from my test paper, or even “I had 7 questions on PowerShell, 4 on Application Gateways, and 3 on Custard Flavours”. That’s not allowed, and it’s not really helpful for anyone honestly trying to pass the exam.

Stepping up from Fundamentals

MS Certifications, Click for Full SizeUnsurprisingly the questions were definitely a natural step-up from the AZ900 Fundamentals exam I took last year– and the content here felt more focused on admin tasks- less of the general “Cloud Computing” viewpoint.

This is to be expected as the exam is for the next level up on the qualification ladder, but it also fit’s in with Microsoft’s role-based qualification mentality. In the past there were more product based exams – “Learn everything about Windows Server 2012” for example – but this has been replaced with a “Learn everything you need to be an Infrastructure Admin” or a “Security Engineer” or “Data Engineer” approach. Check out this chart for details of the current certification offerings and how they fit these roles.

Preparation Materials

Earlier in June I sat the official 4-day course (M-AZ104), hosted by Global Knowledge. The trainer, myself, and the 17 other students, were all connected online as the in-person classroom options are not available at the moment. This method works- and I took plenty of notes – but it’s not quite the same as all being together in-person. For starters, we had to provide our own biscuits!

In post-exam hindsight I think the course probably covered all the material, but perhaps not all of the topics were covered to the depths of the exam. So in addition to hands on experience, I supplemented my notes from the training with lab work and using other materials to reinforce areas I felt weaker on. I used some of the following:

The Questions

My exam had a couple of case study type questions, where you’re given a number of questions around a common environment/problem description. This was followed by a big “normal” multiple-choice section in the middle, and then an extra surprise case study bit at the end (I guess for about the last 10 questions or so). There’s plenty of time to do the test, but watch out and manage your time carefully as you can’t go back to the previous sections once you move on.

From memory, all of my questions were multiple choice, or “put these items in order” kind of questions. There is no lab environment in this exam.

Content wise, it was a real spread across everything on the syllabus. In particular make sure you know the proper Azure terms and where they apply- your Availability zones vs Availability sets for example. It would be easy to lose marks by picking the wrong one in a given situation, or even choosing an answer containing a term that doesn’t even exist.

AZ104 succeeded AZ103 as the exam for this qualification and the main points of new content on the syllabus I spotted were around Containers (Azure Container Instances and Kubernetes Service) and Web Apps (including App Services and App Service Plans). Most of the AZ103 learning material is therefore still valid, but make sure you check the updated list of skills required. I think a basic generic understanding of Kubernetes/Docker/Containers would also be worthwhile for that section.

Online Exam

Thanks to the COVID-19 situation, I took this exam from home- not something I’ve done before as I’ve usually gone along to the local testing centre. Here’s a few tips if you’re planning on doing the same:

  1. Find a space at home that’s nice and free of clutter. You can’t have your “Azure for Dummies” poster on the wall and techie books lying around. Also, remember no-one can walk into, or be overheard in, your exam space during the test.
  2. Ensure you have a stable network connection- so you might want to kick the kids off Netflix. I also ran a long ethernet cable out from my broadband router to the room I was taking the test in to avoid wi-fi hiccups.
  3. Prior to the exam (with Pearson) you’re offered a chance to test your environment. It makes sense to do this on the day of your exam, but be aware that it takes some time as you’re not only testing the network/ webcam/ microphone but also going through the process of taking and uploading photos of your testing space – you’ll have to repeat that bit prior to the exam itself.

If you’re planning on taking AZ-104 soon I hope this all helps, and good luck!