HighPoint RocketRAID 1742 in CentOS 5.2

After having a bit of a fight with CentOS and its broken Realtek 8169 driver (called r8169 and which also claims Realtek 8168 devices but screw up when detecting the link status) I finally got to play with a RocketRAID 1742 controller from HighPoint. Where Promise dropped the ball with supporting hardware RAID on linux (basically you get 2 or 4 extra SATA ports but the RAID controller plays dead), HighPoint promised us true RAID under linux without any wonky software drivers.

Note that the RPM package for the DKMS rr174x driver is attached on the next page.

After installing CentOS 5.2 x86_64 on a seperate SATA drive (to spare me the hassle of figuring out how to install on a RAID drive which is not supported by default), I was not surprised to see that the 2 Maxtor drives (500GB each) where not detected by Linux. The last time somebody told me they had ‘true’ RAID support, I saw the drives themselves which should have been in a RAID setup (when you see /dev/sda and /dev/sdb and the drives should be in mirror mode, you know you are in trouble).

On the HighPoint site you can find the driver disk for CentOS 5 (as it in fact is Redhat Enterprise 5), the clients (CLI, GUI and web based) and the source package. One thing missing (in my opinion) is the binary drivers themselves in a download. Sure, you can extract them from the floppy image (by mounting the image, finding the driver archive, piping it though cpio etcetcetc) but it is a lot of work just to get a 100kb driver from a binary image file.

When you install gcc and kernel-devel you can compile your own driver. Even though the compiler produces a lot of warnings, it completes without any problems and the custom compiled driver is at your disposal faster than you can extract it from that darn floppy image.

After insmod-ing the rr174x.ko driver into the kernel, I noticed it took forever to load and dmesg was not so happy as well:

 rr174x:RocketRAID 174x controller driver v2.1.08.0710 (Jan  3 2009 03:59:59) rr174x:adapter at PCI 3:7:0, IRQ 225 rr174x:start channel [0,0] rr174x:start channel [0,1] rr174x:start channel [0,2] rr174x:start channel [0,3] rr174x:[0 1] Start channel soft reset. rr174x:[0 2] Start channel soft reset. rr174x:channel [0,1] started successfully rr174x:channel [0,2] started successfully rr174x:[0 0] Failed to perform channel hard reset. rr174x:[0 3] Failed to perform channel hard reset. scsi14 : rr174x   Vendor: HPT       Model: DISK_14_0         Rev: 4.00   Type:   Direct-Access                      ANSI SCSI revision: 05 SCSI device sdb: 976617472 512-byte hdwr sectors (500028 MB) sdb: Write Protect is off sdb: Mode Sense: 2f 00 00 00 SCSI device sdb: drive cache: write through SCSI device sdb: 976617472 512-byte hdwr sectors (500028 MB) sdb: Write Protect is off sdb: Mode Sense: 2f 00 00 00 SCSI device sdb: drive cache: write through  sdb:rr174x:hpt_reset(14/0/0) rr174x:start channel [0,2] rr174x:channel [0,2] started successfully rr174x:start channel [0,2] rr174x:channel [0,2] started successfully rr174x:start channel [0,1] rr174x:channel [0,1] started successfully rr174x:start channel [0,1] rr174x:channel [0,1] started successfully rr174x:hpt_reset(14/0/0) rr174x:start channel [0,2] rr174x:channel [0,2] started successfully rr174x:start channel [0,2] rr174x:channel [0,2] started successfully rr174x:start channel [0,1] rr174x:channel [0,1] started successfully rr174x:start channel [0,1] rr174x:channel [0,1] started successfully rr174x:hpt_reset(14/0/0) rr174x:start channel [0,2] rr174x:channel [0,2] started successfully rr174x:[0 2 0] too many resets - make drive offline. rr174x:start channel [0,1] rr174x:channel [0,1] started successfully rr174x:[0 1 0] too many resets - make drive offline. sd 14:0:0:0: SCSI error: return code = 0x00000001 end_request: I/O error, dev sdb, sector 0 Buffer I/O error on device sdb, logical block 0 sd 14:0:0:0: rejecting I/O to device being removed Buffer I/O error on device sdb, logical block 0 sd 14:0:0:0: rejecting I/O to device being removed Buffer I/O error on device sdb, logical block 0 sd 14:0:0:0: rejecting I/O to device being removed Buffer I/O error on device sdb, logical block 0 sd 14:0:0:0: rejecting I/O to device being removed Buffer I/O error on device sdb, logical block 0 sd 14:0:0:0: rejecting I/O to device being removed Buffer I/O error on device sdb, logical block 0 sd 14:0:0:0: rejecting I/O to device being removed Buffer I/O error on device sdb, logical block 0 Dev sdb: unable to read RDB block 0 sd 14:0:0:0: rejecting I/O to device being removed Buffer I/O error on device sdb, logical block 0 sd 14:0:0:0: rejecting I/O to device being removed Buffer I/O error on device sdb, logical block 0  unable to read partition table sd 14:0:0:0: Attached scsi disk sdb sd 14:0:0:0: Attached scsi generic sg2 type 0

Note the ‘hpt_reset’ and ‘rr174x:[0 2 0] too many resets – make drive offline’ messages. Stuff like that is a good indication that you are screwed. After inspecting the brand new discs and the SATA cables used, I concluded that they were fine. Rebooting the system and hopping into the RocketRAID BIOS also showed the drives without any problem and rebuilding the RAID array did not help either.

Finally I found some posts on Google on the ‘too many resets – make drive offline’ error that dmesg showed and it appears that the ‘sata_mv’ (SATA driver for Marvell chips) screws the rr174x drivers over – resulting in what appears to be a faulty RAID. Adding ‘blacklist sata_mv’ to ‘/etc/modprobe.d/blacklist’ fixed the problem and now the RAID comes up fine.

Installing the client tools (after adding some dependencies) was a breeze as well but you need stuff in the right order to test. First you need the driver (rr174x) to load into the kernel by using insmod or modprobe, then you need to start ‘hptsvr’ to provide access to the controller using the driver and finally you can use ‘hptraid’ to fire up the GUI to actually manage the RAID controller. Leave one of these steps out and you wind up with crashing programs and segfaults.

But even when you get this to work, can you imagine what would happen to you RAID system if you upgraded the kernel? Right, the driver would mismatch or be missing altogether and the RAID goes down again. Luckely, CentOS has a DKMS package in RPMForge – which we can use to fix this. For those that never heard of DKMS: its the solution to prevent driver mismatches like this. Often (if not always) a kernel upgrade is a simple matter of binary incompatibility – the interfaces and methods did not change, only the fact that it was recompiled could cause trouble. With DKMS you can insert a driver into a kernel tree and when you upgrade the default kernel of your system, DKMS will recompile the driver (if needed) and add the module to the new kernel.

Enough said, if you don’t have RPMForge in your repository list, do so now. Then install the ‘dkms’ package and finally grab the HighPoint RocketRAID 1742 driver package from below and install it on your CentOS 5.x system to get your RAID controller up and running without a hassle.

{jd_file file==4}

How to make Bleezer look good

Bilbo, the new blogging program for KDE4

Networking for KVM with CentOS 5.5

Arrival

Related Post

Leave a Reply Cancel reply