Trying to Do Research at a Research University, or What Happens When You Order a Rack Server

When I was a graduate student, my advisor needed an AWS server. Wanting the experience to improve my computer knowledge, I volunteered. The learning curve was steep, but I got it to work, and I learned a lot about the nuts and bolts of computers. I eventually managed the infrastructure for several years, including after graduating. I appreciated the experience but knew I did not want to do it in the future and assumed I would one day have a Ph.D. student to do it for me just like my advisor did. Wrong! It turns out that if you are not a famous professor with students knocking at your door, you have to do the gritty work yourself. In other words, I am still partly a sysadmin, the tasks keep expanding, and there is no end in sight.

The inspiration for this post is my most recent experience, installing and configuring a rack server from Lambda Labs. (Highly recommend them, great customer service and the easiest server buying process I could find.) I purchased a powerful server, to a social scientist at least,, and that’s where the easy part ended. 

You see, I work at a research university, so I assumed administration would exist to facilitate my research. Wrong! Though I eventually found a home for it with IT Services, it took over a year, 11.2022-12.2023, of my nagging three administrative arms and two centers to get there. (To be fair, part of the concern was that I thought I would have very sensitive data. Once that data vendor wanted an arm and a leg and I decided not to use them, finding a UCLA group became a lot easier.) I was even onramping with a major center on campus, but their compute specialist quit suddenly. While that center still offered to help me, I was not ready for another multi-month delay, so I started at square zero again and finally, in early December 2023, got IT Services to onboard me into UCLA’s datacenter. You know, the place where all the other high performance servers are and where mind should’ve been a long time ago.  

The physical install was more complicated than I thought it would be, though overall it was straightforward, and the datacenter staff were very helpful. 

Until I turned the machine on. It turns out that data center folks are infrastructure folks, meaning their job is to make sure the power, cooling, and fire prevention work. That’s fine, but it means they assume the academic installing the machine knows everything else. Wrong!

Turning the machine on for the first time was exhilarating. There were lights and loud noises, like turning on a Porsche versus a 3 series. The BIOS start-up asked for the networking information – IP address, subnet, and gateway mask – a non-datacenter staff person had given me, so I assumed everything would be easy. Unfortunately, I immediately could not log on remotely, even after installing OpenSSH. Here is where my new problems began.

On my second visit, I tried six fixes: making sure I was on a UCLA network, triple-checking the networking information, rebooting the ssh daemon, confirming ssh was working, confirming my username, and confirming port 22 was open. All checked out, so I tried my last idea, rebooting the machine. The troubles deepened. 

Having restarted the machine, I noticed it took awhile to log on, with the specific slowdown coming with the network configuration. I quickly learned that the message, “A start job is running for Wait for Network to be Configured (<##>s/no limit)” is a bad sign (and that “/no limit” was more like “/120s”). This message meant I now no longer had an IP address, meaning the machine was less workable than when I first booted up. Fun! 

During this second visit, 90 minutes of torture, I realized I would have to reconfigure my network settings manually. This is done, at least on Ubuntu 22.04 LTS, via the netplan tool and the /etc/netplan/01-netcfg.yaml file. So, I added the IP address, subnet mask, gateway, and DNS info. Over at least a dozen, but probably more, tries, I commented out different lines, tried different DNS addresses, and so on, but still could not get an IP address. Fortunately, an IT administrator was in the data center, but all the could do was confirm that the connections to my machine were working. I asked the about ip a, which is how you check your IP address, and the did not know what I was talking about because I was on a Linux machine. Configuring Windows machines is what this person could’ve helped me with. Windows! At a research university, in a data center! Incredible. 

My fourth trip started with what I thought was definitely the solution, adding the line dhcp4:no to the .yaml. This line tells the computer I have a fixed IP address, which I do. After that, I still did not have an IP address. So, wrong! 

At this point, I started researching how to install a GUI so that I could pursue a more user friendly network configuration route. Figuring that that could add more complications and take awhile, I put that idea on hold.

Finally finally finally, I had an epiphany when looking at the ip a output. There were multiple blocks of output, each headed with an identifier. The configuration .yaml had already been population with one identifier, but there were two others in the ip a output, the only difference being the final digits (0, 1, 2). I then remembered that the rack server has 3 ethernet ports and I had moved the ethernet cord to different jacks several times. So, I changed the 0 in my configuration file to 1, closed the file, reapplied the file, prayed, and typed ip a.

SUCCESS!!!!! My machine recognized its IP address and I ssh-ed in from my laptop with no problem.

To summarize, the problems were that my initial configuration was wrong (I probably had the ethernet in the “wrong” port), I did not declare a fixed IP address, and I was assigning the network info to the wrong port.

Remember, as soon as my machine turned on the first time, I had no one helping me. At a premier research university, I had to learn network configuration on top of rack installation. I spent about 5 hours in the data center, troubleshooting online, and on email trying to troubleshoot. Ultimately, ChatGPT and a colleague who went through similar headaches several years ago were particularly helpful. This effort was after 12-15 hours of meetings just to get someone to host the server. I have a revise & resubmit on my desk, a paper I need to get off my desk, a peer review to complete, research assistants to guide, and teaching to conduct. That is my job and 20 hours is about 18 too many that I lost. 

I know the frustration has not ended. Next week, I have to figure out how to attach a Synology NAS to the server. I already had to install the internal Synology drives myself, and next week will be a networking card. Then, I will connect it to the server, things won’t work, I’ll get angry for hours, and then things will work. Can’t wait.

Perhaps I am naive or entitled for thinking that a research university should encourage research, not technical administration. People complain about IT everywhere, after all. Whether this is how the world is, it is not how it should be. It is also not how it is everywhere. A friend at a major East Coast research university ordered a rack server, shipped it to his IT department, and they configured everything. That is how it should be. 

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.