Migration to VirtualBox hypervisor on FreeBSD

It’s a step on which I thought a lot. Our primary virtualization server was feeling a bit “outdated”… it has been started almost six years ago with Microsoft Virtual Server 2005 and then upgraded to the “R2” release, looking forward the (at that time) promising code-name Viridian technology (now called “Hyper-V“). After Microsoft’s fall, I evaluated CITRIX XenServer and VMware ESXi hypervisors, but the first one, based on Xen, was not able to virtualize Windows without the support for Intel-VT or AMD-V in the host’s CPU, and our server was bought just a few month before these technologies come out. ESXi seemed the right choice for us, but I wasn’t able to get a test machine which complies with all ESXi hardware requirements in a timely fashion. In the meanwhile I started looking at VirtualBox on FreeBSD… until a couple of weeks ago I did not consider it a viable solution to operate as a server-side headless emulator, but after some days of studying and testing I realized that it is actually a strong choice if you need a simple yet reliable hypervisor to consolidate a few non-FreeBSD servers!

Now we have a good virtualization host, on which both Windows guests and FreeBSD jails can run at the same time. With this move the last Windows Server host has gone from our server farm! 🙂

I’ll post some technical details about this migration as soon as possible.

Upgrade pfSense cluster to 2.0-RELEASE

During the weekend we have migrated our main firewall system to the latest release of pfSense. Although we had spent some hours testing in a pilot environment, a couple of minor issues were encountered in the process, as outlined below.

We upgraded the main node at first (as suggested by the “Redundant Firewalls Upgrade Guide“. Then the secondary node was not syncing immediately after its upgrade, because the old webConfigurator SSL certificate was not selected by default. Selecting the right certificate and rebooting the 2nd node solved the problem.

Some bits of the OpenVPN configuration (we have been running a couple of site-to-site VPN) was not retained correctly: tun(4) interfaces disappeared in favour of the new “ovpns” interfaces, so a reassignment was needed. After a couple of clicks all was working again flawlessly.

A little more work is planned to polish the configuration (for example, the FTP proxy is now implemented in a different way, and there is no need to allow port TCP 21 to external WAN addresses), but all the main functions are here, and the performance of the webConfigurator interface seems to be drastically improved.
In conclusion, we are very satisfied by this new version, which we have been running in production since it was released a few weeks ago, for some customer’s new firewalls.

High availability of services with ZABBIX and DNS failover

This blog was born only for testing WordPress some years ago, thus there is no reason to maintain it, but from time to time I like to post here about some change I make in our infrastructure, or about some product or technology I discover to be interesting, more to remind me when I did or read something than to actually inform someone out there, so please excuse me for the fuzzy style of the contents!

Today I put in production a procedure to make inbound Internet traffic automatically fail over a secondary ISP link, by using the strong-tested ZABBIX monitoring platform.

Our primary NOC uses two independent and full-redundant links (two-node firewall, two routers, etc.) in order to access the Internet, and all production-grade services (DNS, mail, IM, web, etc.) are continuously accessible on the public IP addresses of both the links.

Until today, when a connection failure occurred, all clients in our internal networks were immediately able to continue browsing by using the failover link, thanks to a simple source-based routing rule applied by our pfSense cluster, whereas all clients from the Internet couldn’t access the services through the secondary path until the RRs in our DNS zones were manually changed to reply the resolvers with the public IP address in the range of our secondary ISP.

I evaluated a couple of good external DNS failover services: Dynect Active Failover, DNS Made Easy’s service. The first was too expensive for our needs and the second was missing the ICMP ping check we wanted to use.
Then I gave a try to the failover host support of the TinyDNS package for pfSense. It works pretty well, but it would need two public IPs (one from each ISP range) to publish the djbdns service for the dynamic-updating zone, and at this time the range from our secondary provided is exhausted.

So it come the idea to run the dynamic zone on the same DNS servers we use for our public zones, but who might update the RRs in a reliable way? I was pretty confident in the link failure detection of pfSense, which I still use to redirect outbound Internet traffic, but I didn’t like the idea of trusting any other link failure detection script or daemon runnig inside my network… until I had a flash: ZABBIX has been reliably notifying me link failures and recoveries for several months by now. Maybe I could configure it to run the nsupdate(1) command against our primary DNS server each time such an event is triggered!

In fact it has been pretty trivial to configure a new custom media type “script” (named “nsupdate_HA“) and execute it as an “operation” from the action performed when the trigger “link failure” is generated, as shown in this screenshot.

From now on, the hostname of each server publishing a “mission-critical” service can be stored as a CNAME pointing to an A-type record in the ha.valsania.it zone, which is automatically set to the right available public IP address. I measured that the reaction time to a link state change is around 40 seconds: this will definitely make me sleep better at night!


UPDATE:
maybe it can be useful for someone to take a look at the simple shell script I wrote to accept input from ZABBIX, or maybe someone can suggest some improvements!
Three arguments are expected (the recipient, the subject and the body of the message), but we only read the 2nd to know what’s happening, in order to execute proper failover and failback actions.

CommuniGate VoIP services

A lot of time has past since my last post, a lot of work has been done, a lot of hours has been spent in studying and testing new solutions to serve better the needs of our corporate IT environment and our customers’ ones.

One of the bigger improvements is about the oldest form of synchronous collaboration: phone calls! One week ago our CommuniGate system began to route all voice traffic in and out my business. Stalker’s product is the last piece of software I tested to manage such type of  communications (another good candidate was Asterisk) and it proved to be the better choice for businesses of all sizes, thanks to its right price and its amazing scalability and reliability, which lets it to serve five thousand as well as five users with the same high level of performance and functionality.

The wide range of transport and access protocols supported lets our users connect and keep in contact from almost any client software or device on any platform to any customer or partner who might rely on public or private communication system and network (e-mail, Jabber, GTalk, SIP, PSTN, …), so dramatically simplifying the administrative efforts to connect these entities.

A lot of aspects were involved in such evaluation, which I can’t describe here and now, but the results of our tests convinced us that, even if CommuniGate’s strong backgrounds in carrier’s field make it miss a lot of enterprise features at this time, it has all the requirements needed to fight and win in the enterprise market, first of all because of its rock-solid architecture, which makes it run and be supported on almost twenty different computer architectures!

New Cisco router

Today I put in production on my secondary Internet connection the new Cisco 877 Integrated Services Router. Since the previous model was quite older (C837), I’m pretty impressed by the power of this new machine, which features a fully managed 4 ports FastEthernet switch. Thanks to VLAN support, this router will even be able to survive to the current ADSL line, since I can easily configure one FastEthernet port to be the WAN interface.
All unmatched great features of the Cisco IOS 12.4T bring this router among ones best priced in the SME market.

CommuniGate in production

The migration process I began some days ago to the new CommuniGate Pro platform has successfully completed today. Surely, a lot of things might be improved, but all the majour features needed by our internal corporate usersare just available.

Some troubles were encountered in porting old calendar and contact data from the old system, but we have managed to fill all the gaps we encountered by now. The new system will allow us to consolidate both asyncronous and syncronous collaboration services (such as mail, calendar, contacts, presence, IM and even VoIP) into one single, robust, dependable, highly scalable and secure Unix server.

We are trying the Mailshell SpamCatcher plugin to filter the spam remaining after all incoming messages have been inspected by our Sendmail frontend servers, and it seems to be quite a good alternative to the Exchange Intelligent Message Filter which is based on Microsoft’s SmartScreen technology.

This work makes up the biggest step towards a broader migration process to technologies and products out the Microsoft’s world, since the Redmond’s software house missed the promise to bring its customers to reliable but cost effective IT solutions. Quite all today’s products from Microsoft are unreliable and cost-prone both in implementation and management terms.

CommuniGate upcoming

Today I officially began the path towards the migration from Microsoft Exchange to the Stalker’s CommuniGate Pro platform. A lot of time has past since I discovered this piece of software, but what I learned about it in the past months has convinced me to put it in production. CommuniGate has a lot of strenght points which make it a good alternative (even better on many aspects) to the Microsoft’s mail server. I can’t obviously write them all now, but the reliability gained by this software’s architecture (which comes from simplicity) can be enough to persuade a lot of Exchange2k7-sceptic sysadmins to give a look at it.

I’m currently running CommuniGate 5.2.6 on a FreeBSD 7.0/amd64 jail. By now on, all SMTP traffic flow from/to my corporate domains is routed by this server. As soon as I go on the migration process, all contents from Exchange private and public stores will be gradually moved to CGP, until the Exchange host will be left emptied and will be decommissioned. Corporate users whose mailbox has been migrated can already use the new Pronto! web interface by using the URL https://mail.valsania.it/Pronto.

First WPMU upgrade

Today I got the time to upgrade the WordPress MU platform which is running the Valsania Corporate Blogs collection from version 1.5.1 to the release 2.6 (which is based on WordPress 2.6). It has been the first time I upgrade a WPMU instance in a production envirnoment. The upgrade procedure has been extremely straightforward, and I must say I’m quite impressed by that, since my WPMU configuration is very complex (many plugins and customizations were made, but all of them should have been made consistently with the architecture). At the same time I got all the others WP-based blogs I manage upgraded to the 2.6.1 release.

Corporate redirection web site

I’ve finally migrated my corporate URL redirection service from an handwritten MSFT-IIS-ASP solution to a new BSD jailed system running a Drupal instance. The new solution empowers the Front Page, the Path Redirect and Web file Manager modules, and enables all authorized users to both maintain all the official URL redirects and manage a dedicated WebFM file repository, from anywhere their are working.

The base URL of the new service is http://go.valsania.it.

wpDirAuth

I finally got the time to make the wpDirAuth plugin function both on WordPress 2.5.1 and WordPress µ 1.5.1. My need is to migrate to the WPMU platform from Telligent’s Community Server as soon as possible, since I’m planning to port my entire corporate infrastructure from MSFT to the more dependable BSD Unix technology.

Unfortunately the 1.2 version of this plugin simply didn’t work on the latest versions of WP, so I had to apply the Patch for WordPress 2.5 compatibility kindly published by Adam Yearout. As now, I’ve only got the time to test it against Microsoft Active Directory LDAP servers, but I plan to try it in an Apple Open Directory environment before put it in production. The pilot blog collection can be accessed at the well-known Valsania Corporate Blogs WMPU instance.

UPDATE:
I’ve just tried to bind against an Apple Open Directory LDAP service, and the process is quite straightforward: the only real difference is the user object’s attribute to search for to identify the user who is logging in (sAMAccountName for AD, uid for OD), ad shown in the following image.

Click to enlarge

As you see, in this example we have an OD domain named mydomain.local, and we are using the unprivileged user named dsquery to bind to the LDAP service.

NOTE: remember to populate the EMailAddress attribute of your users in Open Directory, if you whish to the required E-mail field in WordPress user profile to be automatically filled upon the first logon.