SSIS: Removing HTML from a field using Script Component

Let’s say you are transferring data from the database of one program to the database of another program.  The problem is that one of the fields, say it’s a field called “usedescription” that is a data type of DT_NTEXT, has HTML in it because the program used that HTML formatting to display the text a certain way on it’s screen.  However, the new program doesn’t need that HTML and perhaps that HTML even causes issues.  So, how do you move the data from one database to the other and properly remove the HTML?

In SSIS, you can use the script component and the HTML Agility Pack.  The HTML Agility Pack is a free code library that you can use to parse HTML (or remove it as we are doing).  First, go to the following website to download the library:

http://htmlagilitypack.codeplex.com

Note that you can now get the library via Nuget.

Once you’ve downloaded the library to your downloads folder and unzipped it, note that the HTML Agility Pack contains sub-folders for the various versions of .Net and even WinRT and Windows Phone.  Since we are using SSIS on SQL Server 2014, copy the files in the .Net40 directory to the C:\Windows\Microsoft.NET\Framework\v4.0.30319\ directory of the server that will be running your SSIS packages.  Next, open up your script component and set it in your data flow and add the field in question to the list of inputs (don’t forget to set it to Read/Write).  Click on Edit Script and when the code editor comes up add the HtmlAgilityPack.DLL to your list of references (you may have to browse for it to add it).

image

You will also need to add the namespace: using HtmlAgilityPack;

Here is the code in the script component with comments along the way (note that I took out the stuff we don’t need):

image

When running our SSIS package, the HTML Agility Pack removes unwanted HTML quite nicely.  Check out our before and after screenshots:

Before:

image

After:

image

Obviously, there is more clean-up to be done.  However, getting the HTML out was a good first step.

JamesNT

Copying files over RDP that are larger than 2G

Ran across this nice little gem today.  I was trying to copy a file to my personal server that I got from a friend so I could do some work to it.  I kept getting the error “unspecified error.”

image

After a quick Bing search, it turns out this is a known error.  In fact, here is the KB article from Microsoft.  Short version:  copying files across an RDP connection by right-clicking the file on your desktop then choosing copy in the RDP screen (aka clipboard redirection) is not supported.  You’ll need to map the drive to the remote computer using the RDP client as the KB article suggests.

JamesNT

SQL Server: Migrating a Database from version 2008 to 2012 Part II

In our last post, found here, we used the backup and restore functionality of SQL Server 2008/2012 to move a database from one version to the other.  Be reminded this is a one way transfer.  The SQL Server 2012 version of the database cannot be moved back to 2008 to the best of my knowledge.  Even though you may set compatibility to stay at 2008, the format of the database files is changed.

Today, we’ll look at the second way to move a database from SQL Server 2008 to SQL 2012:  Detach and Attach.  Note that this is not the preferred method of moving a database from one version of SQL Server to another – or even to another SQL Server of the same version.  The reason is this method involves downtime.  When you detach a database, connections to that database may be dropped and no one can connect until the database is reattached.  Let’s take a look.

  image

Note that you can forcibly drop connections to the database.  You may need to do this during a detach as if there are any connections, the detach will fail.  Keep in mind that many Line-of-Business applications do not like having their connections forcibly closed on them and may themselves crash.  Also note that once you detach the database it will no longer be listed in your database list.

image

Once the database is detached, you can using Windows explorer to copy the database files to the new location on the new server.  Once the files are there, you can attach them to the new server.  If you still need the database active on the old server, don’t forget to re-attach it using the same procedure shown below.

image

image

You may need to set file paths.

image

Once you click OK the database will be attached.  Again, be certain to note that the database file format will be upgraded making it unusable on former versions of SQL Server.

As noted previously, this is not the preferred way to upgrade, or even just relocated, a database to a new server.  It does involve downtime and you may have to kick people off for it to work.  The preferred method, in my opinion, is to do a backup and restore.

JamesNT

SQL Server: Migrating a Database from version 2008 to 2012 Part I

A lot of companies out there are still on SQL Server 2008 (some are still using 2005!).  Now that SQL Server 2012 has been out awhile, I’m seeing questions on what it takes to migrate a database from SQL Server 2008 to 2012.  The bad news is doing an in-place upgrade from 2008 to 2012 is out of the question.  For one thing, many of you are still running SQL Server 2008 on Windows Server 2003 and SQL Server 2012 requires Windows Server 2008 Service Pack 2 or higher.  So you’re going to stand up a new server no matter what.  The good news is that it’s still easy.  In fact, you have two ways to do it.  Either perform a backup and restore or do a detach and attach.  This part covers backup and restore.

Backup and Restore

You can back up your SQL Server 2008 database and restore it to a SQL Server 2012 machine.  Keep in mind that I’m talking about user databases, not any of the system databases.

First, back up your SQL 2008 database.

backup20081

backup20082

Once your backup is complete, copy the BAK file over to your new server.  Once done, restore the database.  Note that you don’t have to create a database first just to restore over it.

2012restore1

2012restore2

2012restore3

2012restore4

Once your database is restored, you can set compatibility level options in case you have older applications that expect some type of older behavior.

image

Keep in mind that this option is not a panacea and that you should thoroughly test before just shutting off your old SQL Server 2008 machine to ensure current applications don’t depend on some old behavior or deprecated feature.  Also, do keep in mind that despite setting the compatibility level to SQL Server 2008, the database file format itself is still upgraded to SQL Server 2012.  This means you cannot move the database back to your old SQL Server 2008 machine.  If you need to move back, you’ll have to extract all the data that has changed since the last backup and import manually.  I know of no way to restore or attach a SQL Server 2012 database to an older version of SQL Server.

JamesNT

SSIS: Using Expressions to build SQL Statements for ADO.NET Connections

In SQL Server Integration Services, you can specify that an OLE DB Connection use a SQL Statement from a variable.

image

Using this approach, you can dynamically build SQL statements using the OLE DB Connection.  But what about ADO.NET?

image

It appears we have no way to dynamically build SQL statements when using ADO.Net providers.  And to think I’ve been standardizing on them.  On the other hand, maybe we do have a way.  I made a package with two variables.  One is a DateTime called LoadControl and the other is a string called strSQL.  I’m going to load a DateTime from a load control table into the LoadControl variable then use the LoadControl variable to build the WHERE clause of a SQL statement to pull out all medical claims with a date of service greater than or equal to the LoadControl date.  First, our variables.

image

Nothing hard about that.  Next, our Execute SQL task to populate the LoadControl variable.

image

image

Next up, our data flow. 

image

The only thing I’ve done is set up the ADO.Net Source.  Now we need to get our strSQL variable populated.  First, be certain to set the property EvaluateAsExpression to TRUE for strSQL.

image

Next, create an expression for this variable like so.  Notice that since DateTime variables cannot be NULL when you create them, SSIS fills in the current DateTime, hence the 5/3/2015 7:38:04 PM.

image

Now the interesting part.  From the Control Flow, single-left click your data flow to highlight it.  Now, look over at your Properties for your data flow.  Scroll down to the Misc. section. 

image

That’s right, you see the SQL statement for the ADO.Net source.  Of course, this is where it is important to call your connection sources something meaningful so you can find them readily (I didn’t bother since we only have one).    Notice that we have two spots for the ADO.Net source:  SQL Command and TableOrViewName.  We aren’t going to change the SQL statement there.  Rather, go down further until you see Expressions.  That’s right, build an expression.

image

image

Notice that for this expression, we only need the strSQL variable.  Once you have that saved, put a data viewer on your package and run it.

image

Notice that only dates for 4/10/2015 or higher are shown (I added an Order By to the SQL Statement in strSQL and 4/10/2015 is what was in our LoadControl Table).  This is where our Expression was evaluated and placed in for the SQL Command of the ADO.Net source.  One thing of note, notice how before I set up a SQL statement in the ADO.NET Connection when I first created it earlier.  This statement is IGNORED when the Expression is evaluated.  However, if the SQL Statement in your Expression adds or changes columns, you may need to go into the Advanced Editor of the ADO.NET Connection and click Refresh to get those changes to show.  Otherwise, your new or changed columns may not show up in the data flow right away.

image

With this approach, you can still dynamically build SQL statements for ADO.NET Connections like you can OLE DB Connections.  A little more work, yes, but I think worth it when you have lots of Script Tasks/Components that need to use Connection Managers.

JamesNT

Why Windows Server 2003 Will Be Around A While Longer

Hello, Everyone.

It’s been just over a year since my last post on my blog.  Things got really interesting last year both on the professional and personal fronts.  As time moves on, I’ll discuss many of those things here.  To start off 2015 with blog posts, I thought I would cover something I’m seeing a lot of that we saw last year:  The end of support for a major Microsoft product.  Last year it was Windows XP.  This year, it’s Windows Server 2003.

There have been lots of articles on the web recently about the end-of-life of Windows Server 2003.  Like its little brother, Windows XP, Windows Server 2003 is to have support completely and permanently removed by Microsoft unless you are willing to pay Microsoft some additional money to extend support.  For more information on the Windows Server 2003 lifecycle, as well as the lifecycle of other MS products, visit the Microsoft Product Support Lifecycle page.  The problem I have with most of these articles is they generally fall into two categories:

  • People just don’t want to move; therefore, are being irresponsible.
  • It’s the economy, stupid!

But there is a third reason that I have seen and this third reason is more prevalent that any of the previous two mentioned above, in my opinion, as to why so many are delaying their migration off of Windows Server 2003:  Because there are still lots of applications out there that still will not run on more modern versions of Windows.  Especially applications written in VB6.

Windows Server 2003 is the last server operating system that will comfortably run VB6 applications without much fuss.  Once you hit Windows Server 2008 it’s pretty much game over for those applications.  While Microsoft does offer limited support for VB6 all the way through Windows 8.1, the problem with every VB6 app I have seen is that they break so many development rules that getting them to run on any version of Windows past Windows Server 2003 is practically impossible.  More modern versions of Windows have higher security standards and so forth that just won’t allow an errant application to do whatever it wants.  I realize that Microsoft has tools available to help with these things, but the point stands that getting many older applications to run on newer versions of Windows is painful, expensive, and the application may yet run less stable than before.

Some of you may wonder aloud why a company would still be on an application written in VB6 or, at the very least, old enough to not run on later versions of Windows.  Because moving line-of-business applications is HARD.  I’ve done this before, am in the process of doing it now, and I can tell you it is HARD.  Please consider the following:

  • Many line-of-business applications today are much more expensive than they were years ago.  A great example is the healthcare arena.  Practice Management and Electronic Medical Record systems that cost ~$5,000 10 years ago are well over ~$20,000 today.  That’s a big jump and most certainly hard to swallow.  One can’t help but wonder how many other applications cost more today than they did yester-year.
    • Along this line goes lack of expertise.  Just because you are on version 1.x of a software and need to go to 12.x doesn’t mean it’s easy.  You may need to phase your upgrades across virtual machines and so forth.  For example, what if the line-of-business software requires a different version of a database when going from Server 2003 to Server 2012 and you have to convert data?  What if the line-of-business application changed databases entirely (e.g. going from Advantage database to SQL Express)?  Who is going to handle all that?  Consultants with that kind of skill can cost a lot of money.  Congratulations, you just doubled the cost of your upgrade.
  • Moving to a new version of a line-of-business application may involve massive re-training.  With new features and changes to the user interface, staff may have to learn their way around all over again.  A good example is the big change from Office 2003 to Office 2007 with the advent of the Ribbon.  Personally, I love the Ribbon in MS Office, but nonetheless it required lots of retraining.
  • Your line-of-business application vendor may not exist anymore.  A lot of companies have gone bankrupt thanks to the housing crash of 2008.  Your company may have to switch line-0f-business application vendors entirely and that is a whole new ball of wax.

I could go on but you should see my point by now.  Even companies with very strategic plans on handling IT and software deployments are finding themselves in a crunch with Windows Server 2003.  Microsoft may find itself selling some extended contracts for many.

It may be time to be a bit more forgiving towards those that are still on Windows Server 2003 for a bit longer.  And, if you are a consultant familiar with older technologies and their newer counterparts, it may be time to start a new advertising campaign.

JamesNT

SSIS: Why You Should Use Temporary Tables

There seems to be some debate about the use of temporary tables in SSIS. I, for one, highly recommend using temporary tables rather than trying to pull data from different sources straight into your database production tables. Consider the following:

  • You are pulling data across a VPN connection when the connection suddenly fails. Perhaps a router failed or something, but now you are stuck with a partial pull.
  • There is some type of unexpected corrupt data that causes your flow to error out. Again, you are stuck with a partial pull.

In the above situations you’ll now be stuck with production tables in a production database that have records that need to be cleaned out. At best, you’ll have to figure out which records made it and which didn’t and then make up the difference. By using temporary tables in your production database all you have to do is wipe out the temporary tables and then start over after figuring out what the error is – the idea being that once the data makes it into your temporary tables you now have full control and no longer have to worry about VPN’s going out or data needing to be changed or cleaned as all that has already been done.

So in your SSIS package you pull data over to your database into a series of temporary tables that might in face have the same structure as your production tables just with different names. Then you pull the data from your temporary tables into your production tables. Some even go as far as to have a separate database just for temporary tables which is perfectly fine.

Remember, when pulling data from other sources, especially outside sources, unless you have some ownership over said sources you control only half of the operation. You have no guarantees as to whether the source is reliable or the data is clean. Therefore, it is very likely your dataflow may be interrupted.

See my post on handling temporary tables in Access.

JamesNT

\MSExchangeIS Mailbox\Messages Queued for Submission is not making progress – Exchange 2010

If you get reports from users that their emails are stuck in either the Drafts or Sent Items box and are not being sent, you should run the Microsoft Exchange Troubleshooting Assistant. If you get the following error message from the assistant, check the free space of the drive that the Exchange Transport Role is installed on to ensure it has at least 2.5GB of free space.

Once you free up space, go to SERVICES.MSC and restart the Microsoft Exchange Transport Service. Email should start working again.

Note: If you receive this error on a SBS 2011 box, one of the best ways to free up space is to move the WSUS database to another drive. Instructions for how to do this for SBS 2011 can be found here.

JamesNT