JPlag State-of-the-Art Software Plagiarism &am @codeKK AndroidOpen Source Website

JPlag

Introduction: State-of-the-Art Software Plagiarism & Collusion Detection

Tags:

JPlag finds pairwise similarities among a set of multiple programs. It can reliably detect software plagiarism and collusion in software development, even when obfuscated. All similarities are calculated locally; no source code or plagiarism results are ever uploaded online. JPlag supports a large number of languages.

📈 JPlag Demo
📖 JPlag Wiki
🏛️ JPlag on Helmholtz RSD
🤩 Give us Feedback in a short (<5 min) survey

Supported Languages

All supported languages and their supported versions are listed below.

Language	Version	CLI Argument Name	state	parser
Java	25	java	mature	JavaC
C	11	c	legacy	JavaCC
C++	14	cpp	mature	ANTLR 4
C#	6	csharp	mature	ANTLR 4
Python	3.6	python3	mature	ANTLR 4
JavaScript	ES6	javascript	beta	ANTLR 4
TypeScript	~5	typescript	beta	ANTLR 4
Go	1.17	golang	beta	ANTLR 4
Kotlin	1.3	kotlin	mature	ANTLR 4
R	3.5.0	rlang	mature	ANTLR 4
Rust	1.60.0	rust	mature	ANTLR 4
Swift	5.4	swift	beta	ANTLR 4
Scala	2.13.8	scala	mature	Scalameta
LLVM IR	15	llvmir	beta	ANTLR 4
Scheme	?	scheme	legacy	JavaCC
EMF Metamodel	2.25.0	emf	beta	EMF
EMF Model	2.25.0	emf-model	alpha	EMF
SCXML	1.0	scxml	alpha	XML
Text (naive, use with caution)	-	text	legacy	CoreNLP
Multi-Language	-	multi	alpha	-

Download and Installation

You need Java SE 25 to run or build JPlag.

Downloading a release

Download a released version.
In case you depend on the legacy version of JPlag, we refer to the legacy release v2.12.1 and the legacy branch.

Via Maven

JPlag is released on Maven Central, it can be included as follows:

<dependency>
  <groupId>de.jplag</groupId>
  <artifactId>jplag</artifactId>
  <version><!--desired version--></version>
</dependency>

Building from sources

Download or clone the code from this repository.
Run mvn clean package from the repository root to compile and build all submodules. Run mvn clean package assembly:single instead if you need the full jar, which includes all dependencies. Run mvn -P with-report-viewer clean package assembly:single to build the full jar with the report viewer. In this case, you'll need Node.js installed.
You will find the generated JARs (jplag-x.y.z-jar-with-dependencies.jar) in the subdirectory cli/target.

Usage

JPlag can either be used via the CLI or directly via its Java API. For more information, see the usage information in the wiki. If you are using the CLI, the report viewer UI will launch automatically. No data will leave your computer!

CLI

Note that the legacy CLI is varying slightly. The language can either be set with the -l parameter or as a subcommand (jplag [jplag options] -l <language name> [language options]). A subcommand takes priority over the -l option. Language-specific arguments can be set when using the subcommand. A list of language-specific options can be obtained by requesting the help page of a subcommand (e.g., jplag java —h).

Parameter descriptions: 
      [root-dirs[,root-dirs...]...]
                        Root-directory with submissions to check for
                          plagiarism. If mode is set to VIEW, this parameter
                          can be used to specify a report file to open. In that
                          case only a single file may be specified.
      -bc, --bc, --base-code=<baseCode>
                        Path to the base code directory (common framework used
                          in all submissions).
      -l, --language=<language>
                        Select the language of the submissions (default: java).
                          See subcommands below.
      -M, --mode=<{RUN, VIEW, RUN_AND_VIEW, AUTO}>
                        The mode of JPlag. One of: RUN, VIEW, RUN_AND_VIEW,
                          AUTO (default: null). If VIEW is chosen, you can
                          optionally specify a path to an existing report.
      -n, --shown-comparisons=<shownComparisons>
                        The maximum number of comparisons that will be shown in
                          the generated report, if set to -1 all comparisons
                          will be shown (default: 2500)
      -new, --new=<newDirectories>[,<newDirectories>...]
                        Root-directories with submissions to check for
                          plagiarism (same as root).
      --normalize       Activate the normalization of tokens. Supported for
                          languages: Java, C++.
      -old, --old=<oldDirectories>[,<oldDirectories>...]
                        Root-directories with prior submissions to compare
                          against.
      -r, --result-file=<resultFile>
                        Name of the file in which the comparison results will
                          be stored (default: results). Missing .jplag
                          extension will be automatically added.
      -t, --min-tokens=<minTokenMatch>
                        Tunes the comparison sensitivity by adjusting the
                          minimum token required to be counted as a matching
                          section. A smaller value increases the sensitivity
                          but might lead to more false-positives.

Advanced
      --csv-export      Export pairwise similarity values as a CSV file.
      -d, --debug           Store on-parsable files in error folder.
      --encoding=<submissionCharsetOverride>
                        Specifies the charset of the submissions. This disables
                          the automatic charset detection
      --log-level=<{ERROR, WARN, INFO, DEBUG, TRACE}>
                        Set the log level for the cli.
      -m, --similarity-threshold=<similarityThreshold>
                        Comparison similarity threshold [0.0-1.0]: All
                          comparisons above this threshold will be saved
                          (default: 0.0).
      --overwrite       Existing result files will be overwritten.
      -p, --suffixes=<suffixes>[,<suffixes>...]
                        comma-separated list of all filename suffixes that are
                          included.
      -P, --port=<port>     The port used for the internal report viewer (default:
                          1996).
      -s, --subdirectory=<subdirectory>
                        Look in directories <root-dir>/*/<dir> for programs.
      -x, --exclusion-file=<exclusionFileName>
                        All files named in this file will be ignored in the
                          comparison (line-separated list).

Clustering
      --cluster-alg, --cluster-algorithm=<{AGGLOMERATIVE, SPECTRAL}>
                        Specifies the clustering algorithm. Available
                          algorithms: agglomerative, spectral (default:
                          spectral).
      --cluster-metric=<{AVG, MIN, MAX, INTERSECTION, LONGEST_MATCH,
        MAXIMUM_LENGTH}>
                        The similarity metric used for clustering. Available
                          metrics: average similarity, minimum similarity,
                          maximal similarity, matched tokens, number of tokens
                          in the longest match, length of the longer submission
                          (default: average similarity).
      --cluster-skip    Skips the cluster calculation.

Subsequence Match Merging
      --gap-size=<maximumGapSize>
                        Maximal gap between neighboring matches to be merged
                          (between 1 and minTokenMatch, default: 6).
      --match-merging   Enables merging of neighboring matches to counteract
                          obfuscation attempts.
      --neighbor-length=<minimumNeighborLength>
                        Minimal length of neighboring matches to be merged
                          (between 1 and minTokenMatch, default: 2).
      --required-merges=<minimumRequiredMerges>
                        Minimal required merges for the merging to be applied
                          (between 1 and 50, default: 6).

Frequency Analysis
      --analysis-strategy=<{COMPLETE_MATCHES, CONTAINED_MATCHES, SUBMATCHES,
        MATCH_WINDOWS}>
                        Specifies the strategy for frequency analysis, one of:
                          COMPLETE_MATCHES, CONTAINED_MATCHES, SUBMATCHES,
                          MATCH_WINDOWS (default: COMPLETE_MATCHES).
      --frequency       Enables analysis and highlighting of rare matches.
      --weighting=<{PROPORTIONAL, LINEAR, QUADRATIC, SIGMOID}>
                        The function for frequency-based match weighting, one
                          of: PROPORTIONAL, LINEAR, QUADRATIC, SIGMOID
                          (default: SIGMOID).
Languages:
  c
  cpp
  csharp
  emf
  emf-model
  go
  java
  javascript
  kotlin
  llvmir
  multi
  python3
  rlang
  rust
  scala
  scheme
  scxml
  swift
  text
  typescript

Java API

The new API makes it easy to integrate JPlag's plagiarism detection into external Java projects:

Language language = new JavaLanguage();
Set<File> submissionDirectories = Set.of(new File("/path/to/rootDir"));
File baseCode = new File("/path/to/baseCode");
JPlagOptions options = new JPlagOptions(language, submissionDirectories, Set.of()).withBaseCodeSubmissionDirectory(baseCode);

try {
    JPlagResult result = JPlag.run(options);

    // Optional
    ReportObjectFactory reportObjectFactory = new ReportObjectFactory(new File("/path/to/output"));
    reportObjectFactory.createAndSaveReport(result);
} catch (ExitException e) {
    // error handling here
} catch (FileNotFoundException e) {
    // handle IO exception here
}

Contributing

We're happy to incorporate all improvements to JPlag into this codebase. Feel free to fork the project and send pull requests. Please consider our guidelines for contributions.

Contact

If you encounter bugs or other issues, please report them here. For other purposes, you can contact us at jplag@ipd.kit.edu. We would love to hear about your research related to JPlag. Feel free to contact us!

Apps

Android Developer Tools

Android Developer Tools Pro

About Me

Tools: TimeShining

GitHub: Trinea

Facebook: Dev Tools

JSON Format, Support error correction

MD5/SHA Encode, Support batch

Text Process

CSS Format and Compress