Thesis Journal
Quarter 1 (Spring 19)
2019-04-15
- Background research
- "Timing analysis of keystrokes and timing attacks on ssh" - Song et al.
- Basically the exact same attack applied to passwords. Very useful.
- Found several papers using acoustics to determine keystrokes
- Found useful wireshark filter for ssh keystrokes (tcp.dstport == 22 and tcp.flags.push == 1)
- Wrote proposal
2019-04-16
- Submitted thesis proposal to DeBruhl
- Discussed logistics of testing
- Began arrangements with Mammen on using 357 students.
- Still need to submit paperwork
- Need to talk to Nico about O.S.
2019-04-18
- Researched NLP and Hidden Markov Chains
2019-04-19
- Requested VirtualBox be installed on CSL machines
- Acquired V.M. for router
- Decided to use a V.M. on AWS to get realistic network variance/latency
- Plan: Set up as router
- Concern: If I make it a proxy router, may be abused. See if I can transparently route or limit to only my subject's VMs.
- Concern: If I transparently route, BGP may break my MitM. Might not be an issue if I only care about the upstream.
- Plan: Open necessary ports
- Researched methods of tracking/tagging SSH connections
- Surprisingly difficult to track origin machine - Mostly due to University Wi-Fi NAT
- Nothing in the packet meta-data uniquely and consistently identifies a victim
- SSH man page describes session variables which may be useful
- SSH_CONNECTION contains the victim port from the remote host perspective
- I can use the timestamp + src port to link a connection to an ID
- Tried printing SSH_CONNECTION to local file on connection init (failed)
- Tried printing SSH_CONNECTION to remote file on init (I don't have access to remote file)
- Started working with ncat to send ID + time + port to router
- I can then join on time + port to assign each SSH flow to an ID
- ncat unencrypted by default, privacy concern + potential abuse
- Unix servers uninstalled ncat, not viable
- Plan: write a basic web server, TLS encrypt, limit to only accept Unix server connections, use CURL to send ID
- Concern: Unix server IPs may not be static. Unlikely but possible.
- Found several simple keyloggers
- Plan: Find one which can monitor one process at a time. Must also include timestamps
2019-04-20
- Realized using V.M. as router and be on the internet doesn't work because I can't force it to be the default gateway
- Try making router impersonate Unix VM's by forwarding connection and add routing rule to victim VM
- How do I tell the difference between a legitimate SSH connection or one to be forwarded?
- Use a different port for real SSH connections?
- Might be easier to just use VPNs and hook on new connections
2019-04-21
- Gathered preliminary data on Unix commands
- Used my ubuntu VM sans emacs which I installed afterwards
- 1828 commands installed by default, most of them length 8 (normal-ish distribution around it)
- A lot of these seem like administrative commands
- 315 non-privileged commands, most of them at length 7
2019-04-22
- Began IRB paperwork
- Began writing prompts for students
2019-04-23
- Plan: Talk to Lupo/Pentoja about Fall Quarter
- Spoke with Debbie Hart
- Mention that I have no influence over grading status
- Mention whether or not there is difference in ability to complete homework
- Ensure students don't feel pressured to participate
- Possibly two consent forms
- Responses are not used, only information generated by the system
- If any part of analysis uses plain responses, submit and IRB
- Submitted IRB paperwork
2019-04-24
- Wrote keylogger script for SSH
- Concern: No way to no trace the password entered into SSH. Might have to
instruct students on RSA keys.
- Wrote up Context-Aware SSH docs
2019-04-27
- Extensively tested key logger
2019-04-29
- IRB approved
- Submitted revised consent form
- Talked to Tedd about getting VirtualBox on lab machines
- Tested OpenVPN
- Measured added latency (8.8.8.8 reference point): ~12.5ms
- Wrote tcpdump filter for only traffic SSHing into another system
- Considered: Fixing to only unix server
- I don't know if the IPs will change and it's still potentially valid data
- Considered: Filtering for only PSH packets (all keystrokes are PSH)
- I might need the extra information later so I'll hold on to it.
- Tested pairing network packets to keystrokes
- Difference between key and packet observation: ~11.8ms
- First SYN packet is within 100ms of SSH starting in log
2019-04-30
- Discussed thesis progress with DeBruhl
- Password entry is questionable
- Maybe write script that generates secureshell key if asked for password?
- Might have to just filter it out
- Decided port mapping might be unnecessary
- Ethan (357 student) expressed interest in project
- DeBruhl offered to do 400 in Fall
- Set up CRON to automatically run packet tap on reboot + app armor permissions
- Server suddenly cannot connect to Unix1
- Forgot to save IPTables rule
2019-05-01
- Started writing script to automatically pair keylog files to packet flows
- Got to the point where it could match the start of a file to a TCP SYN
2019-05-02
- Finished and tested script
- Was able to match all packets with 50ms time difference
2019-05-03
- Tested multiple keylog files.
- Discovered that different keylogs have different delays.
- Delay within one log file fairly consistent
2019-05-06
- Created scripts which copy keylog file to router VM.
2019-05-07
- Changes in delay attributable to AWS server instances changing
- Tested two SSH sessions at same time
- Found uses fd 5 instead of 4 - Not sure why
- Found issue with VM routing
- Apparently, upon reboot it adds tun0 without a VPN being turned on.
Then tries to route all traffic through inactive device
- --No idea why.--
- Found old configuration. systemctl task was trying to open VPN separately.
- Set up VPN to automatically enable when VM boots
- Plan: Separate out each flow into a separate pcap
2019-05-08
- Worked on separating each TCP flow into a separate pcap
- Apparently editcap can't do this on it's own so I'm writing my own utility for this
2019-05-09
- Deployed VM to Unix machines upstairs
- Apparently most people don't have enough space to house VM's
- Might just have to make VM's smaller
- Tested stripping down unneeded packages, didn't do much.
- Might just have to use headless-ubuntu
- Tested running Dataset 1 with Griffian
- Issues with DNS resolution on VPN
- Packets did not seem to capture --(may have just not flushed yet)--
- Packets confirmed not captured.
- Total time ~20 (gave longer than expected answers)
- Set up Lubuntu VM (~1/2 the size)
2019-05-10
- Fixed issues with VirtualBox version differences (CSL was running 6, I was running 5)
- Gathered data from Lucy
- Wrote script to automatically filter the packets before sending them to server.
2019-05-13
- Got VM's working on CSL computers - Had to install to /tmp for space reasons
- Tested client configuration with 3 people simultaneously
- Sequentially each of us had the connection break
- Each after ~15 minutes, each re-established the connection before the next disconnect occurred
- Added prompt to script to give students a chance to review/approve data before submitting
2019-05-14
- Discussed progress with DeBruhl
- Worked on the disconnect problem
- --Seems like it happens after every 15 minutes almost exactly--
- Can't seem to make it happen at all now.
- Gather data from a few students:
- One had a disconnect but no noticeable issues
- Another had a complete disconnect at the very end
- Testing was completely finish and I was able to manually upload the data
- Captured log of what happened
- OpenVPN seems to be detecting itself as a replay attack after network goes down
- Solution: Set up NTP server (virtual machine system clock is way off)
- Solution: Use TCP
2019-05-15
- Talked with Nico about testing O.S./S.S. sections
- Found bug that caused network to disconnect
- Network goes down -> link device removed -> route to VPN through device removed -> VPN taffic has no route
- It takes 2 minutes for OpenVPN to detect the network is broken
- Once OpenVPN tries to renegotiate, fixing the connection causes errors. Have to full restart OpenVPN
- Can fix within first two minutes by adding the route back
2019-05-16
- Gathered more data from 357 students
- Data velocity very slow
- Option: Bigger test - Might dissuade more people
- Option: Skip to dataset 2
- Option: More classes
- Option: Reduce dataset requirements
2019-05-17
- Asked for volunteers from O.S.
- Met with 357 students from Griffian's section/Elie
2019-05-18
- Checked that pcaps, log files, and consent forms line up
- Because of Timezone offsets, some logs on day boundary
2019-07-20
- Finished refactoring packet matcher to work with flow-seperated pcaps.
- Began looking over data
- Seems like a lot of files have incomplete data - may have to rerecord all of it.
2019-07-27
- Looked into issues with files
- At least one file seems like it has all the correct data and timestamps, but poor matches
- Seems packet-matcher has issues
- Fixed. Seems like it was a problem with my keylog pattern matching. FD is not reliable.
- Some files still fail
- Time stamps are way off. Some data has timestamps way after my last recorded flow
- Possibilities:
- Problem with flow separator did not split off a file for these flows (best case)
- Unix Lab sysclock are wrong (Unlikely but recoverable case)
- Problem with recording system did not capture data (worst case)
- Investigation:
- See if I have a syn-packet for one of my problem flows
- No Syn packet found. Nothing close. Looking at server to see if I have data at the source.
- No packets recorder after 2019-05-21.
- I have several flows from 2019-05-23 that were missed.
- Other flows are from way earlier. May be a different problem.
- Some packet captures empty. Seems data capture system needs serious work.
2019-08-11
- Emailed Ethan
- Emailed DeBruhl
- Bought Amazon gift cards
- Looked for patterns in missing data
- None found
- May be a failure in data collection or in data parsing
- On the server nothing from day 23. 3 keylogs from that day.
- Maybe it has to do with the connection dropping randomly?
2019-08-22
- Collected data from work on personal server
- Collection done in lab to best recreate the scenario under which students worked.
2019-09-08
- Examined new data from 8-22
- Data collected fine at first, truncated
- Truncation happened at midnight UTC (within 1 second), after exactly 2000 key strokes
- Both those numbers are extremely suspect, and that they happen to coincide is unfortunate
- Will investigate both these numbers
- May be data rotation did not pick up the existing SSH connection (unlikely)
- SSH may have undergone some kind of renegotiation (should be in logs in this is the case)
- Data is otherwise usable (only 20% data loss)
2019-09-12
- Examined script for rotating logs
- Uses SIGKILL
- Tested SIGKILL, doesn't flush buffer; SIGINT does
- Script now uses SIGINT
- Switched server to use PST so that logs rotate when no one is using it, just in case
- Will test changes tomorrow
2019-09-23
- Above bullet was a lie. Testing today.
- Set up URL for raffle Captcha.
- Submitted flier for IRB review.
2019-09-24
- Worked out confusion with IRB about raffle
- Discussed plans for the quarter with DeBruhl
- Discussed work/plans for the project with Ethan
- Examined files from 09-23
- Files still truncated. Unknown reasons.
2019-09-25
- Emailed professors asking permission to advertise in their class
- Server hasn't been rebooted in 44 days.
- Script was updated 2 weeks ago. New script might not be running.
- Rebooted today.
- Will test that new script is working tonight.
2019-09-26
- Looked at matched file generated from yesterday
- Not all the keys matched
- All packets seem to be accounted for. Nothing visibly truncated
- Possible events happening:
- Some packets grouped together
- Some packets not recorded for unexplained reasons
- Some packets don't match the standard pattern
- Updated script to tag guided/freeform work
2019-09-27
- Handed new VM to Tedd for deployment
2019-09-30
- Checked if new VMs deployed (they are not).
- Upgraded packet matching script: succeeded on 9-25 data.
- Also seems to work with prior, formerly thought to be corrupted, data