In other words, machine learning can de-anonymize programmers from source-code or binary form.

The abstract syntax trees contain stylistic fingerprints that can be used to potentially identify programmers from code and binaries.

They then narrowed the features to only include the ones that actually differentiate developers from each other.

Machine Learning Could Help Identify Author of an Anonymous Code

Examples of a programmers work are fed into the AI where it studies the coding structure.

This approach trains an algorithm to recognize a programmers coding structure based on examples of their work.

For the testing, Caliskan and the other researchers used code samples from Googles annualCode Jamcompetition.

spot_img

Where can it be used?

This approach could be used for identifying malware creators or investigating instances of hacks.

Future Work

Greenstadt and Caliskan plan to study how other factors might affect a persons coding style.

Also, whether the same attribution methods could be used across different programming languages in a uniform way.

Were still trying to understand what makes something really attributable and what doesnt, says Greenstadt.

Source:Defcon

Read More

source: www.techworm.net