Core iTOps Tube

Wednesday, 16 May 2012

My attempt is FAR too slow

I have numerous (hundreds) of data files with various values in each file. The values are 1 per line, and identified by the fieldname in the 1st field in the line, which is delimited from the actual field value by a colon. So an example from one of the files looks like this:



NAME: Bob Jones
ADDRESS: 123 Main Street
CITY: Omaha
STATE: Nebraska
1_TIMESTAMP: 1234567890
2_TIMESTAMP: 2012-05-15 10:15:20


So here's my dilemma - SOME of the timestamps are in epoch format (seconds since 1970) and SOME are already in human readable format (YYYY-MM-DD HH:MM:SS). This cannot be identified by the timestamp fieldname - the values are truely mixed. I need to convert ALL of the epoch timestamps into human readable ones, while NOT changing any other data in the files. I have this working in a shell script, but again, I have hundreds of files, and literally millions of lines to parse through, and my script has currently finished only 15 files in 50 minutes. How can I speed this up?



My script is below...




Code:




for DETAIL_FILE in `find ${MY_DATAFILE_LOCATION} -type f`

do

  grep -a TIMESTAMP ${DETAIL_FILE} |cut -d: -f2 |grep -v "-" |sed 's/^ *//' |sort -un > ${MY_DATAFILE_LOCATION}timestamp.txt

  while read UNIQUE_TIMESTAMP

  do

    HUMAN_READABLE_TIMESTAMP=`date +%F" "%T -d@${UNIQUE_TT_TIMESTAMP}`

    sed -i "s/TIMESTAMP: ${UNIQUE_TT_TIMESTAMP}/TIMESTAMP: ${HUMAN_READABLE_TIMESTAMP}/g" ${DETAIL_FILE}

  done<${MY_DATAFILE_LOCATION}timestamp.txt

done






No comments:

Post a Comment