The issue
Since 2004, an undocumented change in Windows XP/2003 Winsock API lead many Erlang installations to be unable to listen on network socket. This bug was not triggered on all Windows machines and we still do not know yet the cases where the system was failing or working.
This bug have been around for a long time and was a a big problem for us. Whereas, we do not recommand people to deploy our system on Windows servers, many users are first trying Erlang or ejabberd on their Windows workstation. This cryptic problem was making Erlang and ejabberd useless in their testing environment and was not allow them to assess the quality of our work.
In december 2006, we decided to launch a call to find a solution to this issue: Erlang mailing list: Sponsoring: Looking for a Windows network developer.
The solution
Several people have responded to our call and we received help to analyse the problem from Michael Fogeborg, Jani Hakala and Kenneth Lundin. With their help, we managed to identify the problem and to provide the OTP team enough input to build a fix.
The problem was that the duplicateHandle winsock call fail under unknown condition on Windows XP. A workaround was found by avoiding using this method.
We have testing the fix that will be included in the upcoming Erlang R11B-3 and it is working like a charm. Erlang can now start in cluster mode and listen to sockets on any Windows XP machine we found.
So, thank you very much to Michael Fogeborg, Jani Hakala, Kenneth Lundin for investigating with us and fixing the problem.
Next steps
The next steps are the following:
- We are working on a new Windows installer that include the fix. We are about to release it in the next couple of days.
- The OTP team is about to release Erlang R11B-3 which solves the problem in Erlang.
