# Thesis Journal ## Quarter 1 (Spring 19) ### 2019-04-15 * Background research * "Timing analysis of keystrokes and timing attacks on ssh" - Song et al. * Basically the exact same attack applied to passwords. Very useful. * Found several papers using acoustics to determine keystrokes * Found useful wireshark filter for ssh keystrokes (tcp.dstport == 22 and tcp.flags.push == 1) * Wrote proposal ### 2019-04-16 * Submitted thesis proposal to DeBruhl * Discussed logistics of testing * Began arrangements with Mammen on using 357 students. * Still need to submit paperwork * Need to talk to Nico about O.S. ### 2019-04-18 * Researched NLP and Hidden Markov Chains ### 2019-04-19 * Requested VirtualBox be installed on CSL machines * Acquired V.M. for router * Decided to use a V.M. on AWS to get realistic network variance/latency * Plan: Set up as router * Concern: If I make it a proxy router, may be abused. See if I can transparently route or limit to only my subject's VMs. * Concern: If I transparently route, BGP may break my MitM. Might not be an issue if I only care about the upstream. * Plan: Open necessary ports * Researched methods of tracking/tagging SSH connections * Surprisingly difficult to track origin machine - Mostly due to University Wi-Fi NAT * Nothing in the packet meta-data uniquely and consistently identifies a victim * SSH man page describes session variables which may be useful * SSH_CONNECTION contains the victim port from the remote host perspective * I can use the timestamp + src port to link a connection to an ID * Tried printing SSH_CONNECTION to local file on connection init (failed) * Tried printing SSH_CONNECTION to remote file on init (I don't have access to remote file) * Started working with ncat to send ID + time + port to router * I can then join on time + port to assign each SSH flow to an ID * ncat unencrypted by default, privacy concern + potential abuse * Unix servers uninstalled ncat, not viable * Plan: write a basic web server, TLS encrypt, limit to only accept Unix server connections, use CURL to send ID * Concern: Unix server IPs may not be static. Unlikely but possible. * Found several simple keyloggers * Plan: Find one which can monitor one process at a time. Must also include timestamps ### 2019-04-20 * Realized using V.M. as router and be on the internet doesn't work because I can't force it to be the default gateway * Try making router impersonate Unix VM's by forwarding connection and add routing rule to victim VM * How do I tell the difference between a legitimate SSH connection or one to be forwarded? * Use a different port for real SSH connections? * Might be easier to just use VPNs and hook on new connections ### 2019-04-21 * Gathered preliminary data on Unix commands * Used my ubuntu VM sans emacs which I installed afterwards * 1828 commands installed by default, most of them length 8 (normal-ish distribution around it) * A lot of these seem like administrative commands * 315 non-privileged commands, most of them at length 7 ### 2019-04-22 * Began IRB paperwork * Began writing prompts for students ### 2019-04-23 * Plan: Talk to Lupo/Pentoja about Fall Quarter * Spoke with Debbie Hart * Mention that I have no influence over grading status * Mention whether or not there is difference in ability to complete homework * Ensure students don't feel pressured to participate * Possibly two consent forms * Responses are not used, only information generated by the system * If any part of analysis uses plain responses, submit and IRB * Submitted IRB paperwork ### 2019-04-24 * Wrote keylogger script for SSH * Concern: No way to no trace the password entered into SSH. Might have to instruct students on RSA keys. * Wrote up Context-Aware SSH docs ### 2019-04-27 * Extensively tested key logger ### 2019-04-29 * IRB approved * Submitted revised consent form * Talked to Tedd about getting VirtualBox on lab machines * Tested OpenVPN * Measured added latency (8.8.8.8 reference point): ~12.5ms * Wrote tcpdump filter for only traffic SSHing into another system * Considered: Fixing to only unix server * I don't know if the IPs will change and it's still potentially valid data * Considered: Filtering for only PSH packets (all keystrokes are PSH) * I might need the extra information later so I'll hold on to it. * Tested pairing network packets to keystrokes * Difference between key and packet observation: ~11.8ms * First SYN packet is within 100ms of SSH starting in log ### 2019-04-30 * Discussed thesis progress with DeBruhl * Password entry is questionable * Maybe write script that generates secureshell key if asked for password? * Might have to just filter it out * Decided port mapping might be unnecessary * Ethan (357 student) expressed interest in project * DeBruhl offered to do 400 in Fall * Set up CRON to automatically run packet tap on reboot + app armor permissions * Server suddenly cannot connect to Unix1 * Forgot to save IPTables rule ### 2019-05-01 * Started writing script to automatically pair keylog files to packet flows * Got to the point where it could match the start of a file to a TCP SYN ### 2019-05-02 * Finished and tested script * Was able to match all packets with 50ms time difference ### 2019-05-03 * Tested multiple keylog files. * Discovered that different keylogs have different delays. * Delay within one log file fairly consistent ### 2019-05-06 * Created scripts which copy keylog file to router VM. ### 2019-05-07 * Changes in delay attributable to AWS server instances changing * Tested two SSH sessions at same time * Found uses fd 5 instead of 4 - Not sure why * Found issue with VM routing * Apparently, upon reboot it adds tun0 without a VPN being turned on. Then tries to route all traffic through inactive device * --No idea why.-- * Found old configuration. systemctl task was trying to open VPN separately. * Set up VPN to automatically enable when VM boots * Plan: Separate out each flow into a separate pcap ### 2019-05-08 * Worked on separating each TCP flow into a separate pcap * Apparently editcap can't do this on it's own so I'm writing my own utility for this ### 2019-05-09 * Deployed VM to Unix machines upstairs * Apparently most people don't have enough space to house VM's * Might just have to make VM's smaller * Tested stripping down unneeded packages, didn't do much. * Might just have to use headless-ubuntu * Tested running Dataset 1 with Griffian * Issues with DNS resolution on VPN * Packets did not seem to capture --(may have just not flushed yet)-- * Packets confirmed not captured. * Total time ~20 (gave longer than expected answers) * Set up Lubuntu VM (~1/2 the size) ### 2019-05-10 * Fixed issues with VirtualBox version differences (CSL was running 6, I was running 5) * Gathered data from Lucy * Wrote script to automatically filter the packets before sending them to server. ### 2019-05-13 * Got VM's working on CSL computers - Had to install to /tmp for space reasons * Tested client configuration with 3 people simultaneously * Sequentially each of us had the connection break * Each after ~15 minutes, each re-established the connection before the next disconnect occurred * Added prompt to script to give students a chance to review/approve data before submitting ### 2019-05-14 * Discussed progress with DeBruhl * Worked on the disconnect problem * --Seems like it happens after every 15 minutes almost exactly-- * Can't seem to make it happen at all now. * Gather data from a few students: * One had a disconnect but no noticeable issues * Another had a complete disconnect at the very end * Testing was completely finish and I was able to manually upload the data * Captured log of what happened * OpenVPN seems to be detecting itself as a replay attack after network goes down * Solution: Set up NTP server (virtual machine system clock is way off) * Solution: Use TCP ### 2019-05-15 * Talked with Nico about testing O.S./S.S. sections * Found bug that caused network to disconnect * Network goes down -> link device removed -> route to VPN through device removed -> VPN taffic has no route * It takes 2 minutes for OpenVPN to detect the network is broken * Once OpenVPN tries to renegotiate, fixing the connection causes errors. Have to full restart OpenVPN * Can fix within first two minutes by adding the route back ### 2019-05-16 * Gathered more data from 357 students * Data velocity very slow * Option: Bigger test - Might dissuade more people * Option: Skip to dataset 2 * Option: More classes * Option: Reduce dataset requirements ### 2019-05-17 * Asked for volunteers from O.S. * Met with 357 students from Griffian's section/Elie ### 2019-05-18 * Checked that pcaps, log files, and consent forms line up * Because of Timezone offsets, some logs on day boundary ### 2019-07-20 * Finished refactoring packet matcher to work with flow-seperated pcaps. * Began looking over data * Seems like a lot of files have incomplete data - may have to rerecord all of it. ### 2019-07-27 * Looked into issues with files * At least one file seems like it has all the correct data and timestamps, but poor matches * Seems packet-matcher has issues * Fixed. Seems like it was a problem with my keylog pattern matching. FD is not reliable. * Some files still fail * Time stamps are way off. Some data has timestamps way after my last recorded flow * Possibilities: * Problem with flow separator did not split off a file for these flows (best case) * Unix Lab sysclock are wrong (Unlikely but recoverable case) * Problem with recording system did not capture data (worst case) * Investigation: * See if I have a syn-packet for one of my problem flows * No Syn packet found. Nothing close. Looking at server to see if I have data at the source. * No packets recorder after 2019-05-21. * I have several flows from 2019-05-23 that were missed. * Other flows are from way earlier. May be a different problem. * Some packet captures empty. Seems data capture system needs serious work. ### 2019-08-11 * Emailed Ethan * Emailed DeBruhl * Bought Amazon gift cards * Looked for patterns in missing data * None found * May be a failure in data collection or in data parsing * On the server nothing from day 23. 3 keylogs from that day. * Maybe it has to do with the connection dropping randomly? ### 2019-08-22 * Collected data from work on personal server * Collection done in lab to best recreate the scenario under which students worked. ### 2019-09-08 * Examined new data from 8-22 * Data collected fine at first, truncated * Truncation happened at midnight UTC (within 1 second), after exactly 2000 key strokes * Both those numbers are extremely suspect, and that they happen to coincide is unfortunate * Will investigate both these numbers * May be data rotation did not pick up the existing SSH connection (unlikely) * SSH may have undergone some kind of renegotiation (should be in logs in this is the case) * Data is otherwise usable (only 20% data loss) ### 2019-09-12 * Examined script for rotating logs * Uses SIGKILL * Tested SIGKILL, doesn't flush buffer; SIGINT does * Script now uses SIGINT * Switched server to use PST so that logs rotate when no one is using it, just in case * Will test changes tomorrow ### 2019-09-23 * Above bullet was a lie. Testing today. * Set up URL for raffle Captcha. * Submitted flier for IRB review. ### 2019-09-24 * Worked out confusion with IRB about raffle * Discussed plans for the quarter with DeBruhl * Discussed work/plans for the project with Ethan * Examined files from 09-23 * Files still truncated. Unknown reasons. ### 2019-09-25 * Emailed professors asking permission to advertise in their class * Server hasn't been rebooted in 44 days. * Script was updated 2 weeks ago. New script might not be running. * Rebooted today. * Will test that new script is working tonight. ### 2019-09-26 * Looked at matched file generated from yesterday * Not all the keys matched * All packets seem to be accounted for. Nothing visibly truncated * Possible events happening: * Some packets grouped together * Some packets not recorded for unexplained reasons * Some packets don't match the standard pattern * Updated script to tag guided/freeform work ### 2019-09-27 * Handed new VM to Tedd for deployment ### 2019-09-30 * Checked if new VMs deployed (they are not). * Upgraded packet matching script: succeeded on 9-25 data. * Also seems to work with prior, formerly thought to be corrupted, data