Thursday, September 27, 2007

So how many users do you have?

The potential customer asks: "So, how many users do you have on your system?"


System engineer 1: - "Well, perhaps 25.000 concurrent users during peek hours."

System engineer 2: - "We have 80.000 active sessions during peek hours".

System manager: -"Every week, more than 350.000 unique users logs into the system".

Product Manager: - "We have more than 1.000.000 active users!"'

Sales manager 1: - "We have 4.000.000 users!"

Sales Manager 2: - "We have 10.000.000 logins a month!"

Managing director: - "We have 100.000.000 page requests a week!"


So who's lying? Answer: They might all be telling the truth. The problem is not the answer but the question. To illustrate my point; imagine you wanted to choose the vendor with most users. You asked five different vendors and they all responded differently as above. The vendor most likely to give out the highest number would probably win - even though this might have been the vendor with the least users.


(disclaimer - these numbers are fictional and not related to it's learning. But i will give you a number that is not: We had 262.240 unique users logging on to itslearning.com one or more times for the past seven days.)

Monday, September 24, 2007

IMS Enterprise Services

We have recently made a few changes and enhancements to how it's learning can be integrated with external applications and services. Allthough we have more than sixty userdata integrations (think: users, groups, courses, etc) behind us, all of these integrations are batch-based. This means that all the chances happening to your user-data is only imported into it's learning once a day. With the support of IMS Enterprise Services v.1.0 (SOAP based) it is now possible to build integrations that allows for on-the-fly updates of userdata within it's learning.


Q: Does this then mean that we can now offer all of our customers IMS ES?
A: Nah. It takes two to tango. An integration between learning and and external student administrative system involves two parties - the "sender" of userdata (the administrative system) and the receiver (us). Not that many administrative systems yet support IMS ES. There are some honorable exeptions. Capita's SIMS in the UK supports it (if you allow for a slightly loose interpretation) and several customers are currently implementing support for their own homebrewn systems.
Q: Is it possible to customise the IMS ES solution?
A: Even though IMS ES is a pretty clear and precise specification, it allows for extentions. The way we have implemented the IMS ES specification we have taken into account the need of customisations. This means that we can allow for extentions and different interpretations of the IMS ES spesification.

Q: Is it free?
No. In addition to mabye needing the help of one of our consultants you will need a maintenance and support agreement. But investments in integrations usually pays off pretty quickly - the manual labor required to maintain it manually is usually not free...
Q: What technology is is build upon?
The interface (and also the specification) is purely SOAP based, so support with a wide range of platforms and technologies should be guaranteed. To move messages around, transform data and add customisation capabilities we use MS Biztalk 2006.

Wednesday, September 12, 2007

Downtime, part 2

My last post touched on some of the reasons why we scheduled planned downtime when updating it's learning to version 3.2 Thursday evening/night. Unfortunately, the following day we got some unplanned downtime and had to temporarily roll most of our customers back to version 3.1 while troubleshooting and sorting out the bug. Fortunately we quickly found a resolution to the problem, but it is a textbook example of how easy it is to mess up your performance in a large data center.

To explain what happened, I have to start off with the basics of how our hosting environment works (simplified version!):



1. Content switches. This is the entry point of any request to our web-server. The primary function of the content switches is to route you to a pool of web servers based on what it's learning "site" you belong to (our customers are divided into 4-5 different pools of web-servers at the moment - maybe a subject worth blogging about at a later time). The content switch also terminates https traffic; load balances web-servers and caches static files.



2. Web server(s). Every pool of servers consists of 5-8 web-server. This is where the actual application is installed. Based on the load on the servers in the pool a request is assigned to a server. (so for every page you click inside it's learning you could access a different server).



3. Session state server. HTTP is a stateless protocol. Every request from your browser to the server is initiated and terminated. To keep track of who you are a session ID is created. This session ID is stored in a cookie on your computer and on the server. Since we have a lot of web servers and you can be assigned a random server between requests, all sessions are stored on the session state server. When you access http://www.itslearning.com/ you are assigned a session on the session server. This session will continue to live on our session state server until 20 minutes after you close your browser. So with the amount of traffic we receive new sessions are created and expires every second.


4. Database server. This is where the customer databases are stored. Depending of the size of the customer there could be one, two or a heap of customers residing on one database server. It's learning is a very database dependent application, and the amount of traffic makes it important to have finely tuned database servers.


5. File server. The file server(s) act as a client for the SAN where all the files uploaded into it's learning are stored. These are directly connected with dual fiber cards to a very, very expensive hard drive.


So what happened? The problem came with a new security measurement introduced to it's learning. you can now only access files and similar from a separate domain (files.itslearning.com). What we didn't realize what that the implementation created a new session on our session state server for every file that was opened by a user in it's learning. This simply was to much for the session state server, and it froze. We ended up with one of these guys on our servers:


Thursday, September 06, 2007

Why downtime?

So if you have been trying to access it's learning for the last few hours, you have probably been gotten a glimpse of the following message:

A few of you might be wondering what exactly goes on behind the scenes when we take down it's learning and start upgrading. So for the most curious of you, here's a simplified list of what our eager staff of system engineers and developers (I currently count eight of us here!) are currently doing:

  1. Backup. Every update starts with a full backup of all customer data. This typically takes a couple of hours, including verifying that the backup has run properly on every customer database.
  2. Patching. Before we start reinstalling we make sure that every server is patched properly.
  3. Databases are updated with necessarily changes to the new version of our application.
  4. Optimalization scripts are run on each database (new indexes, obsolete data is removed, etc).
  5. The core application is installed on all of our web-servers.
  6. Connected applications are upgraded (like exam, mobile, community and importapplications).
  7. Backup verification. We make sure that all backups are running properly after upgrade is finished.
  8. Documentation. All of our configuration documentation is updated to make sure it is now reflecting our new data center configuration.
  9. Testing. A crew of testers make sure that the application is installed properly before customers are let back onto the servers.
  10. We let you back in. And funny enough, even four in the morning hundreds of users starts logging in :-)

Wednesday, September 05, 2007

it's learning 3.2 is just around the corner...

Yup, tomorrow is the big day. We're upgrading it's learning to version 3.2. I am sure that feelings are mixed amongst our superusers at the moment; some of you probably remember that our last upgrade took the entire site down for a day and left a few of our customers with performance issues for about a week. Others feel that the timing could have been better - if the upgrade had happened a bit earlier teachers would have been better prepared for the new features.

I am not brave or foolish enough to give my 100% guarantees that the upgrade will be unproblematic. There's still some known minor bugs in the new version of the application and we know from experience that the swarm of it's learning users out there collectively are even better at digging up bugs than our professional testers... But I know this: Never have we spent more resources on preparing for an upgrade, never have we done more performance testing than this time and never have we had more beta-testers submitting feedback to us during our beta period. 2500 users have been involved in beta testing and a whooping 689 posts have been made to the beta-forums! In addition to this we have done significant investments in testing and verification environments over the last few months (VMWare and Dell are sending us love letters) and rewritten our change- and release procedures.

So what's new in it's learning? For this upgrade we've focused on the basics. We have identified some of our most used and popular tools and given them a proper upgrade. There's a new editor coming based on dhtml that is much less troublesome than the existing activex and java editors. The message/email system has had a complete makeover, the same for the discussion tool. More features are added to the mobile application and some useful improvements have been done to the assignment tool. The online help has been completely rewritten and SCORM content has now better support within it's learning. There are also a few changes happening under the bonnet. Some extra layers of security are added and minor performance tweaks are also a part of release.

So come Friday morning all of our customers should have a good impression of how well the upgrade went. Please don't hesitate to share your feedback and thoughts here on my blog!

Rising Star?

We seem to be heading into the award season again. Yesterday it's learning was awarded the Rising Star price, a local award for companies in the Region of Hordaland. Amongst the prominent jury was Victor Normann, Ph.D. in economics and former minister of Trade and Industry.

http://www.deloitte.no/