AIX error reporting

This article will focus on one of those tools: the error logging facility. I'll show you how the AIX error logging facility works
The Error Logging Subsystem
On most UNIX systems, information and errors from system events and processes are managed by the syslog daemon (syslogd); depending on settings in the configuration file /etc/syslog.conf, messages are passed from the operating system, daemons, and applications to the console, to log files, or to nowhere at all. AIX includes the syslog daemon, and it is used in the same way that other UNIX-based operating systems use it. In addition to syslog, though, AIX also contains another facility for the management of hardware, operating system, and application messages and errors. This facility, while simple in its operation, provides unique and valuable insight into the health and happiness of an RS/6000 system.
The AIX error logging facility components are part of the bos.rte and the bos.sysmgt.serv_aid packages, both of which are automatically placed on the system as part of the base operating system installation. Some of these components are shown in Table 1.
1)  errorsave , errlast kernel services
errorlog subroutine   -  Kernel and application interface for passing error information to the /dev/error special file
2) /dev/error  -   Special dive file that receive error messages from kernel and application interfaces
3) /usr/lib/errdemon  -   Demon that starts at system initialization and monitors the /dev/error and controls error logging process.
4) /var/adm/ras/errlog  -   The default error log file
5) /usr/bin/errpt   -  Command used to generate error report from error log
6) /var/adm/ras/errtmplt  -  File contains error template repository
7) /usr/bin/errclear -  Command used to clear the errors from error log

Unlike the syslog daemon, which performs no logging at all in its default configuration as shipped, the error logging facility requires no configuration before it can provide useful information about the system. The errdemon is started during system initialization and continuously monitors the special file /dev/error for new entries sent by either the kernel or by applications. The label of each new entry is checked against the contents of the Error Record Template Repository, and if a match is found, additional information about the system environment or hardware status is added, before the entry is posted to the error log.
The actual file in which error entries are stored is configurable; the default is /var/adm/ras/errlog. That file is in a binary format and so should never be truncated or zeroed out manually. The errlog file is a circular log, storing as many entries as can fit within its defined size. A memory buffer is set by the errdemon process, and newly arrived entries are put into the buffer before they are written to the log to minimize the possibility of a lost entry. The name and size of the error log file and the size of the memory buffer may be viewed with the errdemon command:

     [aixhost:root:/] # /usr/lib/errdemon -l
     Error Log Attributes
     --------------------------------------------
     Log File                 /var/adm/ras/errlog
     Log Size                 1048576 bytes
     Memory Buffer Size       8192 bytes

The parameters displayed may be changed by running the errdemon command with other flags, documented in the errdemon man page. The default sizes and values have always been sufficient on our systems, so I've never had reason to change them.
Due to use of a circular log file, it is not necessary (or even possible) to rotate the error log. Without intervention, errors will remain in the log indefinitely, or until the log fills up with new entries. As shipped, however, the crontab for the root user contains two entries that are executed daily, removing hardware errors that are older than 90 days, and all other errors that are older than 30 days.
     0 11  *  *  * /usr/bin/errclear -d S,O 30
     0 12  *  *  * /usr/bin/errclear -d H 90
These entries are commented out on my systems, as I prefer that older errors are removed "naturally", when they are replaced by newer entries.
Viewing Errors
Although a record of system errors is a good thing (as most sys admins would agree), logs are useless without a way to read them. Because the error log is stored in binary format, it can't be viewed as logs from syslog and other applications are. Fortunately, AIX provides the errpt command for reading the log.
The errpt command supports a number of optional flags and arguments, each designed to narrow the output to the desired amount. The man page for the errpt command provides detailed usage; Table 2 provides a short summary of the most useful arguments. (Note that all date/time specifications used with the errpt command are in the format of mmddHHMMyy, meaning "month", "day", "hour", "minute", "year"; seconds are not recorded in the error log, and are not specified with any command.)
-a    Generates a detailed report of entries in the error log
 -d ERRORCLASS    To specify the calass of the error ,H – Hardware, S – Software, O– operator notice , U – undetermined
 -e TimeStamp    To specify the time only before which the error are displayed
-s Timestamp    To specify the time only after which the error are displayed
-j IDENTIFIER    To specify only entries with the identifiers
-k IDENTIFIER    TO specify the exclude identifier list
-c    View error log concurrently
-t     Generate a report from the error template repository.

Each entry in the AIX error log can be classified in a number of ways; the actual values are determined by the entry in the Error Record Template Repository that corresponds with the entry label as passed to the errdemon from the operating system or an application process. This classification system provides a more fine-grained method of prioritizing the severity of entries than does the syslog method of using a facility and priority code. Output from the errpt command may be confined to the types of entries desired by using a combination of the flags in Table 2.

Dissecting an Error Log Entry
Entries in the error log are formatted in a standard layout, defined by their corresponding template. While different types of errors will provide different information, all error log entries follow a basic format.

Here are several examples of error log entry summaries:

     IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
     D1A1AE6F   0223070601 I H rmt3           TAPE SIM/MIM RECORD
     5DFED6F1   0220054301 I O SYSPFS         UNABLE TO ALLOCATE SPACE
                                              IN FILE SYSTEM
     1581762B   0219162801 T H hdisk98        DISK OPERATION ERROR
    
And here is the full entry of the second error summary above:

     LABEL:            JFS_FS_FRAGMENTED
     IDENTIFIER:       5DFED6F1
     Date/Time:        Tue Feb 20 05:43:35
     Sequence Number:  146643
     Machine Id:       00018294A400
     Node Id:          rescue
     Class:            O
     Type:             INFO
     Resource Name:    SYSPFS
     Description
     UNABLE TO ALLOCATE SPACE IN FILE SYSTEM
     Probable Causes
     FILE SYSTEM FREE SPACE FRAGMENTED
     Recommended Actions
           CONSOLIDATE FREE SPACE USING DEFRAGFS UTILITY
     Detail Data
     MAJOR/MINOR DEVICE NUMBER
     000A 0006
     FILE SYSTEM DEVICE AND MOUNT POINT
     /dev/hd9var, /var
    
Monitoring with errreporter
Most, if not all systems administrators have had to deal with an "overload" of information. Multiple log files and process outputs must be monitored constantly for signs of trouble or required intervention. This problem is compounded when the administrator is responsible for a number of systems. Various solutions exist, including those built into the logging application (i.e., the use of a loghost for syslog messages), and free third-party solutions to monitor log files and send alerts when something interesting appears. One such tool that we rely on is "swatch", developed and maintained by Todd Atkins. Swatch excels at monitoring log files for lines that match specific regular expressions, and taking action for each matched entry, such as sending an email or running a command.
For all of the power of swatch, though, I was unable to set up the configuration to perform a specific task: monitoring entries in the AIX error log, ignoring certain specified identifiers, and emailing the full version of the entry to a specified address, with an informative subject line. So, I wrote my own simple program that performs the task I desired. errreporter (Listing 1) is a Perl script runs the errpt command in concurrent mode, checks new entries against a list of identifiers to be ignored, crafts a subject line based upon several fields in the entry, and emails the entire entry to a specified address.
errreporter can be run from the command line, though I have chosen to have it run automatically at system startup, with the following entry in /etc/inittab (all on a single line, but broken here, for convenience):

     errrptr:2:respawn:/usr/sec/bin/errreporter -f /usr/sec/etc/errreporter.conf/dev/console 2&1
    
Of course, if you choose to use this script, be sure to set the proper locations in your inittab entry. The system must have Perl installed; Perl is included with AIX as of version 4.3.3, and is available in source and compiled forms from numerous Web sites. It relies only on modules that are included with the base Perl distribution (see Listing 2 for errreporter.conf file).
Although this script perfectly suits my current needs, there are many areas in which it could be expanded upon or improved. For instance, it may be useful to have entries mailed to different addresses, based upon the entry's identifier. Another useful feature would be to incorporate "loghost"-like functionality, so that a program running on a single server can receive error log entries sent by other systems, communicating via sockets à la the syslog "@loghost" method.
Summary
The AIX Error Logging Facility can provide insight into the workings of your system that are not available on other UNIX platforms. I find it to be just one of the many advantages of AIX in a production environment, and I hope that I have helped to explain this simple yet powerful tool.
In this article, I have touched on some of the more commonly used aspects of the Error Logging Facility in AIX. There are numerous other features and capabilities of this subsystem, including the use of the "diag" command for error log analysis and problem determination, the addition of custom error templates, the redirection of error log entries to and from the syslog daemon, and the use of error notification routines in user-developed code to provide notice and error logging to this subsystem. For more information on those topics, and more detail on the items discussed above, please see the documents listed in the References section below.

System Boot process in AIX

                                                     System Boot process in AIX

Most users perform a hard disk boot when starting the system for general operations. The system finds all information necessary to the boot process on its disk drive.
When the system is started by turning on the power switch (a cold boot) or restarted with the reboot or shutdown commands (a warm boot), a number of events must occur before the system is ready for use. These events can be divided into the following phases: 

                •    ROS kernel init phase

The ROS kernel resides in firmware.
Its initialization phase involves the following steps:
1.    The firmware checks to see if there are any problems with the system board. Control is passed to ROS, which performs a power-on self-test (POST).
2.    The ROS initial program load (IPL) checks the user boot list, a list of available boot devices. This boot list can be altered to suit your requirements using the bootlist command. If the user boot list in non-volatile random access memory (NVRAM) is not valid or if a valid boot device is not found, the default boot list is then checked. In either case, the first valid boot device found in the boot list is used for system startup. If a valid user boot list exists in NVRAM, the devices in the list are checked in order. If no user boot list exists, all adapters and devices on the bus are checked. In either case, devices are checked in a continuous loop until a valid boot device is found for system startup.
Note: The system maintains a default boot list that is stored in NVRAM for normal mode boot. A separate service mode boot list is also stored in NVRAM, and you should refer to the specific hardware instructions for your model to learn how to access the service mode boot list.
3.    When a valid boot device is found, the first record or program sector number (PSN) is checked. If it is a valid boot record, it is read into memory and is added to the IPL control block in memory. Included in the key boot record data are the starting location of the boot image on the boot device, the length of the boot image, and instructions on where to load the boot image in memory.
4.    The boot image is read sequentially from the boot device into memory starting at the location specified in NVRAM. The disk boot image consists of the kernel, a RAM file system, and base customized device information.
5.    Control is passed to the kernel, which begins system initialization.
6.    The kernel runs init, which runs phase 1 of the rc.boot script.
When the kernel initialization phase is completed, base device configuration begins.

                                  •    Base device configuration phase

The init process starts the rc.boot script. Phase 1 of the rc.boot script performs the base device configuration.
Phase 1 of the rc.boot script includes the following steps:
1.    The boot script calls the restbase program to build the customized Object Data Manager (ODM) database in the RAM file system from the compressed customized data.
2.    The boot script starts the configuration manager, which accesses phase 1 ODM configuration rules to configure the base devices.
3.    The configuration manager starts the sys, bus, disk, SCSI, and the Logical Volume Manager (LVM) and rootvg volume group configuration methods.
4.    The configuration methods load the device drivers, create special files, and update the customized data in the ODM database.

                                  •    Booting the system

Use these steps to complete the system boot phase.
1.    The init process starts phase 2 running of the rc.boot script. Phase 2 of rc.boot includes the following steps:
          a.    Call the ipl_varyon program to vary on the rootvg volume group.
          b.    Mount the hard disk file systems onto their normal mount points.
          c.    Run the swapon program to start paging.
          d.    Copy the customized data from the ODM database in the RAM file system to the  ODM database in the hard disk file system.
          e.    Exit the rc.boot script.
 After phase 2 of rc.boot, the boot process switches from the RAM file system to the hard disk root file system.

Continuous system-performance monitoring with commands in AIX

 Continuous system-performance monitoring with commands in AIX


The vmstat, iostat, netstat, and sar commands provide the basic foundation upon which you can construct a performance-monitoring mechanism.
You can write shell scripts to perform data reduction on the command output, warn of performance problems, or record data on the status of a system when a problem is occurring. For example, a shell script can test the CPU idle percentage for zero, a saturated condition, and execute another shell script for when the CPU-saturated condition occurred. The following script records the 15 active processes that consumed the most CPU time other than the processes owned by the user of the script:

# ps -ef | egrep -v "STIME|$LOGNAME" | sort +3 -r | head -n 15

•    Continuous performance monitoring with the vmstat command

The vmstat command is useful for obtaining an overall picture of CPU, paging, and memory usage.
The following is a sample report produced by the vmstat command:
# vmstat 5 2
kthr     memory             page              faults        cpu    
----- ----------- ------------------------ ------------ -----------
 r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
 1  1 197167 477552   0   0   0   7   21   0 106 1114 451  0  0 99  0
 0  0 197178 477541   0   0   0   0    0   0 443 1123 442  0  0 99  0
Remember that the first report from the vmstat command displays cumulative activity since the last system boot. The second report shows activity for the first 5-second interval.

•    Continuous performance monitoring with the iostat command

The iostat command is useful for determining disk and CPU usage.
The following is a sample report produced by the iostat command:
# iostat 5 2

tty:      tin         tout   avg-cpu:  % user    % sys     % idle    % iowait
          0.1        102.3               0.5      0.2       99.3       0.1    
                " Disk history since boot not available. "


tty:      tin         tout   avg-cpu:  % user    % sys     % idle    % iowait
          0.2        79594.4               0.6      6.6       73.7      19.2    

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
hdisk1           0.0       0.0       0.0          0         0
hdisk0          78.2     1129.6     282.4       5648         0
cd1              0.0       0.0       0.0          0         0
Remember that the first report from the iostat command shows cumulative activity since the last system boot. The second report shows activity for the first 5-second interval.
The system maintains a history of disk activity. In the example above, you can see that the history is disabled by the appearance of the following message:
Disk history since boot not available.
To disable or enable disk I/O history with smitty, type the following at the command line:
# smitty chgsys

Continuously maintain DISK I/O history [value]
and set the value to either false to disable disk I/O history or true to enable disk I/O history. The interval disk I/O statistics are unaffected by this setting.

•    Continuous performance monitoring with the netstat command

The netstat command is useful in determining the number of sent and received packets.
The netstat command is useful in determining the number of sent and received packets.
The following is a sample report produced by the netstat command:
# netstat -I en0 5
    input    (en0)     output           input   (Total)    output
 packets  errs  packets  errs colls  packets  errs  packets  errs colls
 8305067     0  7784711     0     0 20731867     0 20211853     0     0
       3     0        1     0     0        7     0        5     0     0
      24     0      127     0     0       28     0      131     0     0
CTRL C
Remember that the first report from the netstat command shows cumulative activity since the last system boot. The second report shows activity for the first 5-second interval.

•    Continuous performance monitoring with the sar command

The sar command is useful in determining CPU usage.
The sar command is useful in determining CPU usage.
The following is a sample report produced by the sar command:
# sar -P ALL 5 2

AIX aixhost 2 5 00040B0F4C00    01/29/04

10:23:15 cpu    %usr    %sys    %wio   %idle
10:23:20  0        0       0       1      99
          1        0       0       0     100
          2        0       1       0      99
          3        0       0       0     100
          -        0       0       0      99
10:23:25  0        4       0       0      96
          1        0       0       0     100
          2        0       0       0     100
          3        3       0       0      97
          -        2       0       0      98

Average   0        2       0       0      98
          1        0       0       0     100
          2        0       0       0      99
          3        1       0       0      99
          -        1       0       0      99
The sar command does not report the cumulative activity since the last system boot

Etherchannel configuration on LINUX

                                Creating Ether channel on LINUX
 
Creating an EtherChannel between a Red Hat Enterprise 5 server, and a Cisco Catalyst 3750 Switch. This is actually far simpler then it sounds, and can be completed in about ten minutes.
We’ll begin with configuring the IEEE 802.3ad Dynamic link aggregation (AKA EtherChannel) on the Red Hat Enterprise Linux server. Begin by logging in via SSH, Telnet or directly on the console itself. I do recommend having access to the console directly, so should anything go wrong and you lose network connectivity you’ll be able to easily change things back.
Once logged into the server, switch user to "root" if you’re not already logged in as root. Change directory to "/etc" and modify the "modprobe.conf" file using your favorite text editor such as "vi". I personally like using "nano". Add the lines in bold from the example "modprobe.conf" below to your file. Then save your changes and return to the bash prompt.
Sample /etc/modprobe.conf
alias scsi_hostadapter megaraid_sas
alias scsi_hostadapter1 usb-storage
alias eth0 bnx2
alias eth1 bnx2
alias bond0 bonding
options bond0 miimon=100 mode=4 lacp_rate=1
Next we need to create a network script for the "bond0" interface that we defined above in the "modprobe.conf" file. This will be used to configure the network properties for the virtual adapter. Once again, use your favorite text editor to create a new file called "ifcfg-bond0" in the "/etc/sysconfig/network-scripts" directory. In this file you will define the device name used above"bond0", IP address, gateway, network mask etc for the virtual adapter. Below is an example.
Sample /etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=bond0
BOOTPROTO=none
ONBOOT=yes
NETWORK=192.168.0.0
NETMASK=255.255.255.0
IPADDR=192.168.0.25
USERCTL=no
GATEWAY=192.168.0.1
TYPE=Ethernet
IPV6INIT=no
PEERDNS=yes
When you’re done configuring the properties of the virtual adapter, save your changes and exit the editor.
The next step is to modify the network script for each adapter that will be added to the EtherChannel. The adapters that we’ll be using in this server are eth0 and eth1. Please note your interfaces may be different, so check before continuing.
Start by modifying "ifcfg-" using your text editor, where is the interface name. In this case my file name is "ifcfg-eth0". Add the proper references to the virtual adapter created above "bond0" and remove any IP information such as IP address, gateway, netmask etc since that information will be handled by the virtual adapter. Below is an example of the "ifcfg-eth0" file. Note the bold items are required for the EtherChannel to function properly.
Sample /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
BOOTPROTO=none
HWADDR=00:11:22:33:44:55
ONBOOT=yes
MASTER=bond0
SLAVE=yes

TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes
Repeat the steps above for each additional interface you add to the Etherchannel.
Sample /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
HWADDR=66:77:88:99:aa:bb
ONBOOT=yes
MASTER=bond0
SLAVE=yes

BOOTPROTO=none
TYPE=Ethernet
USERCTL=no
Now that each physical adapter has been associated to the virtual adapter,

Not able to FTP or copy a file with more than 2 GB

Copy error when trying to copy a file from your pc to server where file size is more than 2GB.

  Error : cp: Requested a write of 4096 bytes, but wrote only 3584


How to to get out of this error


 Scenario :

 I was trying to copy 2.5 GB file from my local system to AIX system through WinScp but after copying 1 GB file coping of file get stopped.

Then I tried with FTP still the same issue.

Then I tried to copy that file from one Linux machine through NFS (after successfully uploaded to LINUX system) to AIX still the same issue.

Getting error : " cp: Requested a write of 4096 bytes, but wrote only 3584. "

Then I remember the USer options .

ULIMIT is the one which we need to check

so I changed the ulimit to unlimited for temporary and able to copy the file.

#ulimit -f unlimited
#ulimit -d unlimited


So How to use Ulimit

User limits - limit the use of system-wide resources.

Syntax
      ulimit [-acdfHlmnpsStuv] [limit]


Options

   -S   Change and report the soft limit associated with a resource.
   -H   Change and report the hard limit associated with a resource.

   -a   All current limits are reported.
   -c   The maximum size of core files created.
   -d   The maximum size of a process's data segment.
   -f   The maximum size of files created by the shell(default option)
   -l   The maximum size that may be locked into memory.
   -m   The maximum resident set size.
   -n   The maximum number of open file descriptors.
   -p   The pipe buffer size.
   -s   The maximum stack size.
   -t   The maximum amount of cpu time in seconds.
   -u   The maximum number of processes available to a single user.
   -v   The maximum amount of virtual memory available to the process.

ulimit provides control over the resources available to the shell and to processes started by it, on systems that allow such control.

The soft limit is the value that the kernel enforces for the corresponding resource. The hard limit acts as a ceiling for the soft limit.

An unprivileged process may only set its soft limit to a value in the range from 0 up to the hard limit, and (irreversibly) lower its hard limit. A privileged process may make arbitrary changes to either limit value.

If limit is given, it is the new value of the specified resource. Otherwise, the current value of the soft limit for the specified resource is printed, unless the `-H' option is supplied.
When setting new limits, if neither `-H' nor `-S' is supplied, both the hard and soft limits are set.
Restricting per user processes ( -u) can be useful for limiting the potential effects of a fork bomb.

Values are in 1024-byte increments, except for `-t', which is in seconds, `-p', which is in units of 512-byte blocks, and `-n' and `-u', which are unscaled values.
The return status is zero unless an invalid option is supplied, a non-numeric argument other than unlimited is supplied as a limit, or an error occurs while setting a new limit.
ulimit is a bash built in command.

Configuring MPIO for Virtual AIX client

Configuring MPIO for the virtual AIX client
This document describes the procedure to set up Multi-Path I/O on the AIX clients of
the virtual I/O server.

Procedure:
This procedure assumes that the disks are already allocated to both the VIO servers
involved in this configuration.

· Creating Virtual Server and Client SCSI Adapters
First of all, via HMC create SCSI server adapters on the two VIO servers and
then two virtual client SCSI adapters on the newly created client partition, each
mapping to one of the VIO servers´ server SCSI adapter.
An example:
Here is an example of configuring and exporting an ESS LUN from both the
VIO servers to a client partition:

· Selecting the disk to export
You can check for the ESS LUN that you are going to use for MPIO by
running the following command on the VIO servers.
On the first VIO server:
$ lsdev -type disk
name status description
..
hdisk3 Available MPIO Other FC SCSI Disk Drive
hdisk4 Available MPIO Other FC SCSI Disk Drive
hdisk5 Available MPIO Other FC SCSI Disk Drive
..
$lspv
..
hdisk3 00c3e35c99c0a332 None
hdisk4 00c3e35c99c0a51c None
hdisk5 00c3e35ca560f919 None
..
In this case hdisk5 is the ESS disk that we are going to use for MPIO.
Then run the following command to list the attributes of the disk that you choose for MPIO:
$lsdev -dev hdisk5 -attr
..
algorithm fail_over Algorithm True
..
lun_id 0x5463000000000000 Logical Unit Number ID False
..
..
pvid 00c3e35ca560f9190000000000000000 Physical volume identifier
False
..
reserve_policy single_path Reserve Policy True
Note down the lun_id, pvid and the reserve_policy of the hdisk4.

· Command to change reservation policy on the disk
You see that the reserve policy is set to single_path.
Change this to no_reserve by running the following command:
$ chdev -dev hdisk5 -attr reserve_policy=no_reserve
hdisk4 changed
On the second VIO server:
On the second VIO server too, find the hdisk# that has the same pvid, it could
be a different one than the one on the first VIO server, but the pvid should the
same.
$ lspv
..
hdisk7 00c3e35ca560f919 None
..
The pvid of the hdisk7 is the same as the hdisk5 on the first VIO server.
$ lsdev -type disk
name status description
..
hdisk7 Available MPIO Other FC SCSI Disk Drive
..
$lsdev -dev hdisk7 -attr
..
algorithm fail_over Algorithm True
..
lun_id 0x5463000000000000 Logical Unit Number ID False
..
pvid 00c3e35ca560f9190000000000000000 Physical volume identifier
False
..
reserve_policy single_path Reserve Policy True
You will note that the lun_id, pvid of the hdisk7 on this server are the same as
the hdisk4 on the first VIO server.
$ chdev -dev hdisk7 -attr reserve_policy=no_reserve
hdisk6 changed


· Creating the Virtual Target Device
Now on both the VIO servers run the mkvdev command using the appropriate
hdisk#s respectively.
$ mkvdev -vdev hdisk# -vadapter vhost# -dev vhdisk#
The above command might have failed when run on the second VIO server, if
the reserve_policy was not set to no_reserve on the hdisk.
After the above command runs succesfully on both the servers, we have
same LUN exported to the client with mkvdev command on both servers.


· Check for correct mapping between the server and the client
Double check the client via the HMC that the correct slot numbers match the
respective slot numbers on the servers.
In the example, the slot number 4 for the client virtual scsi adapter maps to
slot number 5 of the VIO server VIO1_nimtb158.
And the slot number 5 for the client virtual SCSI adapter maps to the slot
number 5 of the VIO server VIO1_nimtb159.


· On the client partition
Now you are ready to install the client. You can install the client using any of
the following methods described in the red book on virtualization at
http://www.redbooks.ibm.com/redpieces/abstracts/sg247940.html:
1. NIM installation
2. Alternate disk installation
3. using the CD media
Once you install the client, run the following commands to check for MPIO:
# lsdev -Cc disk
hdisk0 Available Virtual SCSI Disk Drive
# lspv
hdisk0 00c3e35ca560f919 rootvg active
# lspath
Enabled hdisk0 vscsi0
Enabled hdisk0 vscsi1


· Dual Path
When one of the VIO servers goes down, the path coming from that server
shows as failed with the lspath command.
# lspath
Failed hdisk0 vscsi0
Enabled hdisk0 vscsi1


· Path Failure Detection
The path shows up in the "failed" mode, even after the VIO server is up
again. We need to either change the status with the “chpath” command to
“enabled” state or set the the attributes “hcheck_interval” and “hcheck_mode” to
“60” and “nonactive” respectively for a path failure to be detected automatically.

· Setting the related attributes
Here is the command to be run as padmin for setting the above attributes:
$ chdev -l hdisk# -a hcheck_interval=60 -P
The VIO AIX client needs to be rebooted for hcheck_interval attribute to take
effect.

· EMC for Storage
In case of using EMC device as the storage device attached to VIO server,
then make sure of the following:
1. Powerpath version 4.4. is installed on the VIO servers.
2. Create hdiskpower devices which are shared between both the VIO
servers.

· Additional Information
Another thing to take note of is that you cannot have the same name for
Virtual SCSI Server Adapter and Virtual Target Device. The mkvdev command
will error out if the same name for both is used.
$ mkvdev -vdev hdiskpower0 -vadapter vhost0 -dev hdiskpower0
Method error (/usr/lib/methods/define -g -d):
0514-013 Logical name is required.
The reserve attribute is named differently for an EMC device than the attribute
for ESS or FasTt storage device. It is “reserve_lock”.
Run the following command as padmin for checking the value of the
attribute.
$ lsdev -dev hdiskpower# -attr reserve_lock
Run the following command as padmin for changing the value of the attribute.
$ chdev -dev hdiskpower# -attr reserve_lock=no

· Commands to change the Fibre Channel Adapter attributes
And also change the following attributes of the fscsi#, fc_err_recov to “fast_fail”
and dyntrk to “yes”
$ chdev -dev fscsi# -attr fc_err_recov=fast_fail -attr dyntrk=yes –perm
The reason for changing the fc_err_recov to “fast_fail” is that if the Fibre
Channel adapter driver detects a link event such as a lost link between a storage
device and a switch, then any new I/O or future retries of the failed I/Os will be
failed immediately by the adapter until the adapter driver detects that the device
has rejoined the fabric. The default setting for this attribute is 'delayed_fail’.
Setting the dyntrk attribute to “yes” makes AIX tolerate cabling changes in the
SAN.
The VIOS needs to be rebooted for fscsi# attributes to take effect.

ADD this Info

Bookmark and Share