ConvertFrom-String or Not

One of the new cmdlets in PowerShell v5 is ConvertFrom-String. It's using a Microsoft Research Technology called "FlashExtract", a machine learning technology. I was hoping that it would finally rescue us from RegEx! ~/Yay/Heh/g

I've seen several MVPs write about it, mostly before Microsoft released the final version but some recent as well:
The premise is that, you provide samples of the data and specify what you want to extract out of it, and FlashExtract engine is supposed to be smart enough to evaluate the template you provided and get the properties you want. If only life was that simple...

Alright here is our victim... Let's take a look at our WinSxS (component) folder. People always complain about how big it is and how unwieldy it gets over time as more and more patches come down. It's a tempting target to clean up old patches and other binaries but also the shortest path to breaking your Windows if you remove a needed file.

Anyway, I am more interested in creating an easy to read PowerShell object based database out of it.

They look like this:

Microsoft tells us how it is formatted - What's that awful directory name under Windows\WinSxS. Oh, it's supposed to be a 'friendly-name' with this format:


So, the name is separated by a bunch of underscore characters. This makes our life so easy to parse it, right? Wrong! "name" too can have "_", as well as a bunch of other chars like "." and "-".

Still, after two minutes of RegEx tinkering I got that function and it worked perfectly:

PS C:\Users\Adil> (Get-WinSxSRegex)[1,100,1000]

Processor      : amd64
Name           : 1394.inf
PublicKeyToken : 31bf3856ad364e35
Version        : 10.0.10586.0
Language       : none
Hash           : 87b4eef7b03f2543

Processor      : amd64
Name           : c_bluetooth.inf.resources
PublicKeyToken : 31bf3856ad364e35
Version        : 10.0.10586.0
Language       : en-us
Hash           : 14d0dc285c89e9ba

Processor      : amd64
Name           : microsoft-windows-b..iagnostic.resources
PublicKeyToken : 31bf3856ad364e35
Version        : 10.0.10586.0
Language       : hu-hu
Hash           : 2d6242e40cbc20a9

The example above is showing the 1st, 100th and 1000th records. Luckily, they are good examples to demonstrate the challenge that ConvertFrom-String cmdlet faces.

So, I took a couple of folder names and started to put the property names as I would like them.

This is supposed to collect the Processor, but if I do not use "*",  it does not work. It seems that when the first matching line is found, that becomes a record, and to be able to get more records, we need that * in there.

This will work, but only for some. See, because it starts with a number, without any other sample for Name, engine thinks we want something that starts with 'numbers', and gives us only matching folder names that has numbers in the 'name' section. so, we need to give different Name samples.

To a human eye, this is pretty easy to locate. It's a hex, and if we were using regex, we would use something like '[a-fA-F0-9]+', in other words one more numbers or characters that stops are between 'a' and 'f'. Well, we are not dealing with a human here.

OK, this must be the easiest to identify however you look at it. 4 digits with "." between them

Language is a bit tricky. It could be none or some international language designation like "en-us" for US English.

Finally, we have the hash, which is pretty much like the PublicKeyToken

I will leave it here. I tried a couple of alternatives in templates, and providing more was not getting me anywhere as I kept on getting error messages telling me ConvertFrom-String cmdlet could not come up with a way to extract data using my template and suggested I email Microsoft. I did, though I am not holding my breath on that.

I really like the concept of ConvertFrom-String, and in my opinion, It would be one of the most powerful features of PowerShell but to me, it feels like it is just not there YET.

Sample templates I used are below. If you will use the functions, pipe them to |out-gridview. It is pretty easy to detect problems that way.

No comments: