Ranching technology post 1



 Every once in a while I like to post an update on how I use computer and electronic technology to assist with my agricultural pursuits.  Previous posts on technology were not explicitly called out as such and were often mixed with other, non-technical things.  In an effort to improve the organization and searchability of my posts, I've decided to more clearly distinguish such posts in the future.  This current post represents my first technology post.

The main advancement I've made recently has been the resolution of a surveillance system software glitch that has plagued me since I first began using the system over a year ago.  For unknown reasons, the system would from time to time decide to stop sending surveillance and security information and refuse to accept any remote commands.  During these times only a manual power reset would enable the system to resume normal operations.  Needless to say that was not an acceptable situation. 

Because the malfunction had bothered me for such a long time, I've decided to write in greater depth than I normally would on the subject in hopes of helping others that may experience similar issues.  For the non-technical readers, I apologize in advance for the following detail.  For background on the system and how it works visit my previous post on the subject. 

Initially, I was completely clueless about what was causing the problem, but over time I began noticing various patterns in the malfunctions that helped me to focus my troubleshooting.   First, I noticed a tendency for the system to malfunction mostly when I had left the ranch to return to my home in northern Virginia (my normal pattern is to work for a block of time at the ranch in southern Virginia and then return to my home in northern Virginia for a few days to rest).  Usually the malfunction would occur late on the night of the first day I had returned home.  At first it didn't make sense to me until my wife, Jess, pointed out that I would often plan my return trips around bad weather because it sucks to work in the rain or snow.  The surveillance system is solar powered and during bad weather the heavy cloud cover reduced the power to the system's batteries.  The system would continue operating on the batteries for a while until the charge became too low and the charge controller shut the power supply off.  My initial theory on this issue was that there was a problem with the power system, but thorough diagnostics confirmed that this wasn't the problem.  The power system was operating normally and would resume power to the surveillance system once enough sunlight was available. 

Additional malfunctions led to additional theories and tests, which eventually narrowed the list of possible causes to the component I refer to as the "main computer" in my previously referenced post.  This main computer consists of a Raspberry Pi computer, which is basically a tiny computer the size of a baseball card with a Linux operating system.  Further testing indicated that the malfunction was not hardware-based, so I started to look through the system logs. 

At this point I'm going to have to go "full-nerd" in this post and simply list what I did without a lot of explanation because it would take too long to do so.  I found very little in syslog or any other log to indicate the cause of the malfunction except the following: 

Mar 28 22:12:29 system dhclient: DHCPREQUEST on eth0 to 192.168.1.1 port 67
Mar 28 22:12:29 system dhclient: send_packet: Operation not permitted
A problem of this type with dhclient suggested there was something awry with iptables (the firewall on this Linux system).  Ideally I would run sudo iptables -L -n -v, but I couldn't do that when I couldn't access the system during the malfunction.  To get around this I wrote the following python script:

import subprocess,os,time

homedir = '/home/user'

def findreport(homedir):
    reported = 'undone'
    reports = os.listdir(homedir)
    for r in reports:
        rname, rext = os.path.splitext(r)
        if rext == '.txt':
            if rname[0:4] == 'repo':
                textfile = os.path.join(homedir,r)
                writereport(textfile)
                reported = 'done'
    if reported == 'undone':
        os.environ['TZ'] = 'EST+05EDT,M4.1.0,M10.5.0'
        time.tzset()
        timeint = 'report_starting_' + str(int(time.time()))+".txt"
        newtextfile = os.path.join(homedir,timeint)
        textfile = open(newtextfile,'w')
        textfile.close()
        os.chown(newtextfile, 1001, -1)
        writereport(newtextfile)

def writereport(textfile):
    p = subprocess.Popen(['iptables', '-L', '-n', '-v'], 
stdout=subprocess.PIPEhow,stderr=subprocess.PIPE)
    out, err = p.communicate()
    os.environ['TZ'] = 'EST+05EDT,M4.1.0,M10.5.0'
    time.tzset()
    timestr = '\n\n' + time.strftime('%X %x %Z') + '\n'
    fullreport = timestr + out + err
    report = open(textfile,'a')
    report.write(fullreport)
    report.close()
    print "report written to " + textfile + ". "

findreport(homedir) 
 
At first I attempted to set the script to run every two hours with cron, but quickly realized that cron wouldn't run sudo commands well, so I created a second file called iptablesrecorder.sh and saved the following to it:

#!/bin/sh

      

      sleeprunrecord(){

              /usr/bin/python /home/user/pathtoscript/iptablesrecorder.py 

              sleep 7200

      #       sleep 120

              }

      

      while : 

      do

              sleeprunrecord

      done
 

Finally, I placed /bin/sh /home/user/pathtoscript/iptablesrecorder.sh  in the rc.local file and rebooted the system. 

The result of all this was that I had the system automatically checking the status of iptables every two hours and saving the output of each check to a text file, which could be checked by me after the next malfunction.

When the next malfunction occurred, I eagerly accessed the surveillance system's main computer and, by comparing the computer's syslog entries with the contents of the iptablesrecorder.py generated text file, was able to determine what was going on.  As suggested by my wife, bad weather was leading to low battery power levels, which eventually caused a power outage late at night.  At some point (either the next morning or whenever enough sun light was available) the power would be restored and the main computer would reboot.  For some reason possibly associated with the previous sudden power loss, iptables would fail to load the custom rules I had saved with iptables-persistent during this reboot and instead load a different set of rules that completely locked down the system.  A search of different directories uncovered a different iptables rules file, which I either didn't know about or had forgotten. The content of this "alternative" rules file directly corresponded with the outputs captured by iptablesrecorder.py during the period of malfunction, suggesting that this was the source of the problem.  Rather than delete the alternative rules file, I decided to replace the content of the alternative rules files with my intended rules.  While the solution does not answer the question of why iptables is sometimes loading rules from one file and at other times loading the rules from a different file, it appears to have ended the malfunctions, which is the most important thing.

The bottom line for people using linux with iptables is that if you are experiencing problems with iptables appearing to "forget" custom rules after a reboot even though you used iptables-persistent, you might want to try searching your directories for another rules files that is competing with your custom rules file.

Comments

Popular posts from this blog

Front-end loader mounted t-post installer

DIY cultipacker-roller

Face fly nightmare