Friday, June 27, 2008

What do you do when the limitations of SQL Management Studio will not show all of your results?

Our application works with an XML file defined by an external industry specification. We found that a significant portion of this file is not required by our application. So, when loading this file into our system, we remove the portions that we do not care about and stuff the rest into a database. What is left can still be a pretty good sized XML string. Common usage shows the reduced size to still be around 300-500 KB.

Recently, we needed to investigate the contents of one of these modified files as well as the history of what was loaded. Using the standard developer playbook, the first response is to quickly fire up SQL Management Studio and craft a query to get the ntext column. At that point, the 65K limit on a column in Management Studio yields about a quarter of the file.

It would be pretty easy to pull up Visual Studio and put together a quick utility to extract the XML into a file. As a developer, I would likely keep this project around for an excessively long period of time. After all, it is now part of my code toolbox. So, rather than squirrel away a project file, source file, and all the related baggage of a VS project, I chose PowerShell as the hammer for this nail. The C# code that would have been created in Visual Studio, in this case, is simply a language that is accessing the functionality in the .net framework. Since PowerShell has access to the .net framework, it is very similar code simply using a different language. Additionally, the PowerShell script is a single file without the weight of a VS project. The only thing that I keep in my toolbox is the code required to complete the task.

What is required is the pretty common code of opening a connection, executing a query, and using the results. This is how this common C# task is represented in PowerShell. We are pulling back several rows that contain this large XML field and tagging the resuts by upload date.

param( $theId, $server,$db )


$connection = New-Object System.Data.SqlClient.SqlConnection("data source=$server;initial catalog=$db;Integrated Security=SSPI")
$command = New-Object System.Data.SqlClient.SqlCommand("select * from MyTable where theItemId=$theId", $connection)

$connection.Open()

$reader = $command.ExecuteReader()
while( $reader.Read()  ) {
    $fileName = "Item_{0}_DateLoaded_{1:yyyy_MM_dd_HH_mm_ss}.xml" -f $theId,$reader['dStart']
    Out-File  -FilePath $fileName -InputObject $reader['reallyBigXMLText']
}

$connection.Close()
$reader.Close()

Save that text into a .ps1 file and that is all that is necessary to extract the large text field. I am not one that has cut over from the classic command line to PowerShell. At the same time, it is important to know when this tool can be helpful.

Friday, June 27, 2008 1:11:03 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  |  Trackback
Monday, June 23, 2008

The next topic in this series was supposed to be invariants. While that topic will still be coming up, the topic of nulls was not complete enough. In the last post, coverage was mostly related to parameters and method signatures. Since parameters are not the only variables in a system that can be null, it makes sense to talk a little about class fields. Consider the following class member from the Sentence class discussed in Part 1.

List<string> words = new List<string>();

Several methods and properties in the sentence class perform operations on this variable by dereferencing it and calling its methods and properties. Spec# notices this and warns of a potential null dereference. At first this seemed like a bit of an inconvenience. But if you think about this a second, this warning is for every location where "object reference not set to instance of an object" can occur. Wow, removing that error from the development cycle would no doubt be a huge boost to productivity. So a simple change to the signature eliminates that warning.

List<string>! words = new List<string>();

So what if you are not instantiating objects the same way for each class creation. Try removing the new from the declaration, and a whole different error shows up. This time the constructors all get flagged. Spec# notices that the fields that have been marked as non-null have not been assigned in the constructor. New up a sentence object in the constructor on the validation engine is now happy.

There is an impressive depth to the null checks that can lead to a much more solid code base. In the strong typed world where compiler errors are preferred, Spec# strengthens the type checking pushing potential errors earlier in the development cycle to the point of some of them showing up in the editor pre-compile time!

Monday, June 23, 2008 3:26:17 PM (Eastern Standard Time, UTC-05:00)  #    Comments [1]  |  Trackback
Friday, June 20, 2008

One of the key characteristics that makes a professional-quality application is defensive coding. While necessary, this can hide the meaningful portions of logic. It can also lead to many tests that are more related to the chosen implementation than the business rules. Take this simple piece of code to add a word to a sentence.

public void AddWord(string word)
 {
     if (word == null) throw new ArgumentNullException("word");
     if (word.Length == 0) throw new ArgumentException("word");
     words.Add(word);
 }

The validation code dominates this simple function. Additionally it does not provide the consumer of the method with any hints to the parameters constraints. Sure, xml docs can provide some documentation, but that still does not help at design time. A set of tests that check the expected exception would also be common for this block of code. There is plenty to debate about the non-functional tests, but it is safe to assume that this kind of testing is common.

The Spec# team is doing some incredible work to make constraints like this either enforced by the language or more visible by placing them within the method signature.

The first parameter check relates to nullness. Spec# provides an operator for forcing non-null. The signature of AddWord would now look like:

public void AddWord(string! word)

This tells the compiler and Visual Studio that this parameter cannot be null. If you even try to pass a string that can be null to this method, a warning will appear at design time. A compiler error will also result. This really cranks up the visibility of the constraint. The user of this method can also see the constraint right in the signature.

The second parameter check relates to some data constraint on the string. In this case, it is required to be a non-empty string. What is really going on here is that this method has a pre-condition relating to the data it is going to work with. Preconditions in Spec# are specified with the requires keyword. So our method now looks like:

public void AddWord(string! word)
 requires word.Length > 0;
{
 words.Add(word);
} 

Now the requirements are part of the method signature that show up in intellisense. This alone is a big help to the development process. The precondition is validated at runtime kicking out an exception if the condition is not met. It is worth noting that you can map the stock RequiresException back to an argument exception via the otherwise keyword.

requires word.Length > 0 otherwise ArgumentException;

It is pretty evident to see the big gain here in the signature of the method containing all of the relevant constraints. Even more assistance can be provided at design and compile time when these constraints are provided in the signature. Even better than all of this is that the body of the method is left with only the required logic the method is responsible for.

There is plenty more to cover with the direction spec# is taking. Next up, validating the internal state of an object as part of the class definition.

Friday, June 20, 2008 10:39:57 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  |  Trackback
Wednesday, February 13, 2008

One of the things I really enjoy about the local developer groups is the sharing of ideas and usage patterns of common tools. At tonight's Ann Arbor .net Developers Group, the topic of CodeRush templates came up. I commonly use a set of templates centered around the Rhino Mocks framework As I was describing these templates Jay Wren claimed they should be posted. The irony in that is the first time I saw someone using CodeRush was in a presentation Jay was giving on IoC. As a result of that presentation, we are using Castle Windsor in our application and CodeRush as a productivity tool. 

Creating a Mock Instance

To use Rhino Mock to create a mock instance, you write something like:

MyType myType = (MyType)mocks.DynamicMock(typeof(MyType));

This is an exceptionally redundant exercise that just screams for a Template. The core part of the template takes a string from the Get string provider (named Type) for the type of the instance to create. It will also name the variable with the same name as the type with a slightly different format. Even though this variable name is a linked field, you can break the link by pressing Ctrl-Enter while the cursor is on the variable name. This is commonly required in when creating multiple mock instances in the same scope.

#MockInstance#: Base template not intended to be called directly

«Caret»«Link(«?Get(Type)»)»«BlockAnchor»«Link(«?Get(Type)»,FormatLocalName,PropertyNameFromLocal)»= («Link(«?Get(Type)»)»)mocks.DynamicMock(typeof(«Link(«?Get(Type)»)»));

mk: the type name is initialized to MyType

«?Set(Type,MyType)»«:#MockInstance#»

mk\: the type name is initialized from the Clipboard

«?Set(Type,«?Paste»)»«:#MockInstance#»

Setting up the Tests

The mock instance templates are not much use without having the MockRepository set up and test methods to call. These templates come in handy as well.

mtf: the test fixture

[TestFixture]
public class «Caret»My«BlockAnchor»Test
{
 private MockRepository mocks;

 [SetUp]
 public void Setup()
 {
  mocks = new MockRepository();
 }
 
 «Marker»
}


mt: the test

[Test]
public void «Caret»Test«BlockAnchor»()
{
 «Marker»
 mocks.ReplayAll();
}

Hopefully these templates will be found useful and maybe even kick off some ideas for new ones.

UPDATE: Yea, it would be easier if I added the export file.

CSharp_Custom.xml (12.45 KB)
Wednesday, February 13, 2008 11:44:18 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  |  Trackback
Tuesday, February 12, 2008

PowerShell has been a topic of interest for me for the last couple of weeks. I have been squeezing in some reading on it here and there for about a little over a week. Out of the blue pops up one of those issues at work that does not involve making an elegant object model, but rather digging through log files. How cool is that? For more years than I care to mention, it seems like every time I start looking into a new topic, something comes up where that research saves a ton of time. So there I am whipping up this script, when SharpReader pops up toast that says Using an IDE to write PowerShell Scripts. If you are doing any PowerShell work at all, go ahead, stop reading this post and go get that tool. The post will be here when you get back. The PowerShell Suite has been discounted for 2 weeks associated with Hanselman's post and there is a free version for non-commercial use.

The issue was that users in Germany were getting access denied during a particular time frame. Since this time frame overlapped with with the nightly import of user information, there was some speculation that there was a collision with the import. We wanted to get a list of users that got the access denied and correlate that with the users that were imported. That leaves us with 500MB for 2 hours on each server. 8 web servers and a 4 hour period in question yields 16GB worth of log files to mine. Definitely an automation task.

The log file looks like:

... cut
[8364/18416][Tue Feb 07 04:30:27 2008][..\..\..\CSmHttpPlugin.cpp:402][INFO:1] PLUGIN: ProcessResource - Resolved Url '/security/accessdenied.aspx?ReturnUrl=index.aspx'.
[8364/18416][Tue Feb 07 04:30:27 2008][..\..\..\CSmHttpPlugin.cpp:515][INFO:1] PLUGIN: ProcessResource - Resolved Method 'GET'.
[8364/18416][Tue Feb 07 04:30:27 2008][..\..\..\CSmHttpPlugin.cpp:566][INFO:2] PLUGIN: ProcessResource - Resolved cookie domain '.SITENAMEWASHERE.com'.
[8364/18416][Tue Feb 07 04:30:27 2008][..\..\..\CSmHttpPlugin.cpp:3727][INFO:1] PLUGIN: EstablishSession - Decoded SMSESSION cookie -  - User = 'uid=USERIDHERE,dcxdealercode=DEALERCODEWASHERE,ou=dealerships,ou=dealers,o=DOMAINWASHERE.com', IP address = 'IPADDRESSWASHERE'.
... cut

What we needed was the occurrence of access denied where the source was the home page. A few lines later in the log, the SiteMinder user information was present. The script then needs two modes: looking for access denied and looking for the user info. Get-Content provides reader like functionality kicking out a line of text at a time from the provided file. Piping to a ForEach calls the associated script block for each line of text in the file. Use some Regex magic to find the correct string and extract information from the user information line. The result of the Regex is where I ran into the biggest hang up. PowerShell has an automatic variable $matches that is populated when a match is found (when a match is found not when a match is attempted). So I had to clear out the $matches variable after processing a match. The only output was the parsed user information that could be passed off to Brian, who deals with the imports. It is a pretty simple piece of script that looks like:

param ( $infile )

$lookingForSiteMinderCookie = $FALSE
$matches = $null
$Get-Content $infile | ForEach-Object {
  if( $lookingForSiteMinderCookie )
  {
    if ( $_ -match '^.*\[.*\](\[.*\])\[.*\]\[.*\].*uid=([a-zA-Z0-9]*).*dcxdealercode=([0-9]*).*$' )
    {
      "TimeStamp: {0}`tUser: {1}`tDealer: {2}" -f $matches[1],$matches[2],$matches[3]
      $matches = $null
      $lookingForSiteMinderCookie = $FALSE
    }
  }
  else
  {
    if ( $_ -match '^.*accessdenied\.aspx\?ReturnUrl=\%2fhome\%2fmain\.aspx.*$' )
    { 
      $matches = $null
      $lookingForSiteMinderCookie = $TRUE 
    }
  }
}

 

This was saved as LogParse.ps1.

From the PowerShell command line:

  • Use Get-ChildItem with a filter that matches the log file names to get a list of log files
  • Iterate through each file using ForEach calling LogParse.
  • Output to Out-File to save the results

Get-ChildItem logfilepattern | ForEach-Object { .\LogParse $_ } | Out-File deniedUsers.results

I have found PowerShell to be a very useful tool for doing some quick and dirty tasks. I can certainly see that with a bit more experience with it, a command line .net interpreter could be very handy. Check it out. You can rarely go wrong by having another tool in the toolbox. I do want to point out that this was primarily an exercise in using Powershell because I wanted to learn it better. The power of PowerShell is not going to come in text processing where there are many tools that have been around for a very long time that do that job very well. Many shells use strings when piping between commands. Passing a string object is not that much better. The real power comes when you start using objects and even more when you use objects in the pipeline. That is the feature unique to PowerShell.

Tuesday, February 12, 2008 8:29:01 AM (Eastern Standard Time, UTC-05:00)  #    Comments [0]  |  Trackback
Thursday, January 17, 2008

Predicate: Logic. that which is affirmed or denied concerning the subject of a proposition.

In programming terms, a function that returns true or false taking a subject as a parameter. So, Predicate<T> is really just a specific signature applied to a delegate. A method compliant with this signature looks like:

bool method(T item)

The collection classes in System.Collections.Generic have several methods that will take a predicate as a parameter. I really like this separation of concerns. The responsibility of iterating through the collection is owned by the collection; the responsibility of making a decision based on data in an object is owned by the object. What seemed to be missing to me was the common operation of combining boolean results with logical operators. What I really wanted to do was to pass a set of predicate functions to the collection methods. Of course, the collection authors could not know how I would want the boolean results combined. This seemed to be the responsibility of something else that could contain a list of predicates, but at the same time return a predicate function that is the logical combination of the results.

The PredicateList class is a container of Predicate<T> delegates. The methods that can be used as a predicate are the logical AND and OR operations. These methods iterate through the contained predicates and short circuit when arriving at a value that determines the result of the operation.

using System;
using System.Collections.Generic;

namespace Utilities
{
  public class PredicateList<T> : List<Predicate<T>>
  {
    public PredicateList(params Predicate<T>[] predicates)
    {  AddRange(predicates);  }
    public bool And(T item)
    {
      foreach (Predicate<T> pred in this)
      {
        if (!pred(item)) return false;
      }
      return true;
    }
    public bool Or(T item)
    {
      foreach (Predicate<T> pred in this)
      {
        if (pred(item)) return true;
      }
      return false;
    }
  }
}

Using this class with one of the collection methods would look like:

PredicateList<T> filters = new PredicateList<T>(IsThis, IsThat, IsTheOtherThing);
return FindAll(filters.And);

Thursday, January 17, 2008 10:31:02 AM (Eastern Standard Time, UTC-05:00)  #    Comments [1]  |  Trackback

Theme design by Jelle Druyts