Content Migration Progress Tracking

When moving hundreds of thousands of courses between WebCT Vista and Desire2Learn, keeping track of what made it through which stage seems like an obvious hindsight thing to do. I added that last bit because we started to notice where things fell between the cracks starting to pile up. The basic process…

    1. Through Oracle SQL we populate a middleware database with those courses which meet acceptable criteria.
      CRACK: With the wrong criteria, courses are not even selected.
    2. Through Oracle SQL we generate XML listing in 50 count sets of the courses.
      CRACK: A subset of data loaded into the database may be extracted.
    3. A shell script automates running the WebCT command-line backup process to create a package for each course.
      CRACK: The command-line backup fails on some courses.
    4. Desire2Learn scripts pick up the files and convert WebCT formatted packages to Desire2Learn.
      CRACKS: Too big fail. Too long paths fail. This step can fail to create CSV files for the next step.
    5. Converted packages are placed in a queue to be imported into Desire2Learn.
      CRACKS: WebCT Vista courses can have 1,000 characters in the name and D2L fails if there are more than 50. Courses named the same as a previously loaded one but with a different file name loads both into the same course.

So, there are apparently five different stages and eight potential failures to track and no end-to-end tracking to even know what is missing. Which means inventing something to check logs for errors. 

First thing, I tried writing SQL to create an ordered list of the courses that are available.

The WebCT backups were a little tougher to convert into a useful list. The file names follow the format of Course_Parent_1234567890.bak. They were also in various directories, so I ended up doing something like this to get a list of the files, strip off the parents & time stamps, strip off the directories, and order it.

ls */*.bak | awk -F_ ‘{print $1}’ | awk -F\/ ‘{print $2}’ | sort

So I have two ordered lists. Anything in the D2L one and not in the WebCT one ought to be the work of clients making their own stuff. Anything in the WebCT one and not in the D2L one ought to be my missing ones. Only almost every line is a mismatch.

Visually comparing them, I realized the same courses had in effect different names. All the spaces were converted to underscores. So I ended up adding a sed to convert spaces to underscores after the sort.

Then I wrote some more scripts.

    • Go through the logs from #4 and #5 to display the errors. With it I was able to compare my list of missing with the errors and confirm why they did not come through.
    • Some of the cracks can be addressed by making new import files. So I wrote a script to add those lines to a redo.csv file. Touch up the problems and go.

Basically at this point it only covers 3-5. At some point I am going to have to check steps 1-5. 

from Rants, Raves, and Rhetoric v4


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s