In other words, machine learning can de-anonymize programmers from source-code or binary form.
The abstract syntax trees contain stylistic fingerprints that can be used to potentially identify programmers from code and binaries.
They then narrowed the features to only include the ones that actually differentiate developers from each other.
Examples of a programmers work are fed into the AI where it studies the coding structure.
This approach trains an algorithm to recognize a programmers coding structure based on examples of their work.
For the testing, Caliskan and the other researchers used code samples from Googles annualCode Jamcompetition.
Where can it be used?
This approach could be used for identifying malware creators or investigating instances of hacks.
Future Work
Greenstadt and Caliskan plan to study how other factors might affect a persons coding style.
Also, whether the same attribution methods could be used across different programming languages in a uniform way.
Were still trying to understand what makes something really attributable and what doesnt, says Greenstadt.
Source:Defcon
Read More
source: www.techworm.net