ParaTen

This project implements the CP-ALS algorithm on Spark, it also implements related vector, matrix and tensor operation on Spark.
The aim of this project is to utilize the power of RDD to achieve fast iteration while performing decomposition.
This project is also a exploration of Type-Safe tensor operation, which is enlightened by TensorSafe.

Build and Run

1. Environment

sbt 0.13.15
Scala 2.11.11
Spark 2.1.1

2. Compile and package

# package without dependencies
sbt package

It will produce package target/scala-2.11/paraten_2.11-1.0.jar.

# package with all dependencies
sbt assembly

You will get a big package target/scala-2.11/ParaTen-assembly-1.0.jar.

3. Usage

CP decomposition on Spark
Usage: ParaCP [options]

  -s, --shape <value>     shape
  -r, --rank <value>      rank
  --maxIter <value>       number of iterations of ALS. default: 500
  --tol <value>           tolerance for the ALS. default: 0.001
  --tries <value>         tries
  -o, --output-dir <dir>  output write path.
  -i, --input <value>     path of input file.

-s, --shape, -r, --rank and -i, --input are required.

4. Example call

Use the spark-submit to run ParaCP on cluster,

spark-submit \
--class org.chaomai.paraten.apps.ParaCP \
--master spark://Chaos-MacBook-Pro.local:7077 \
--total-executor-cores=4 \
--executor-memory=2g \
target/scala-2.11/ParaTen-assembly-1.0.jar \
-s 2,2,3,2 -r 5 \
-i hdfs://localhost:9000/user/chaomai/paraten/data/test_dim4_dense.tensor \
-o hdfs://localhost:9000/user/chaomai/paraten/result \
--maxIter 30 --tol 0.1 --tries 2

Also you can use ParaCP_cluster_deploy_mode.sh or ParaCP_client_deploy_mode.sh to run it. They accept same parameters.

[dim_1,..,dim_N (tensor)] [rank] [tensor file path] [output path] \
[max iteration] [tolerance] [tries] [master of Spark] \
[total-executor-cores] [executor-memory]

The first one submit with --deploy-mode cluster.

./src/main/scala/org/chaomai/paraten/apps/ParaCP_cluster_deploy_mode.sh \
2,2,3,2 5 \
hdfs://localhost:9000/user/chaomai/paraten/data/test_dim4_dense.tensor \
hdfs://localhost:9000/user/chaomai/paraten/result \
30 0.1 2 \
spark://Chaos-MacBook-Pro.local:6066 \
4 2g

5. Example decomposition

original image

reconstruct image by factor matrix

Set rank to 1000, iterate until converged. Pickup top k rank according to lambda vector and reconstruct the tensor i.e. the image.

$norm = \lVert\mathcal{X} - \mathcal{X}'\lVert = \sqrt{\sum^{I_1}{i_1 = 1} \sum^{I_2}{i_2 = 1} \cdots \sum^{I_N}{i_N = 1} (x{i_1 i_2 \cdots i_N} - x'_{i_1 i_2 \cdots i_N})^2}$

top k rank	norm	image
200
400
600
800
1000

References

An Investigation of Sparse Tensor Formats for Tensor Libraries. Parker Allen Tew. 2015.
HaTen2: Billion-scale Tensor Decompositions. Inah Jeon, Evangelos E. Papalexakis, U Kang, Christos Faloutsos. 31st IEEE International Conference on Data Engineering (ICDE) 2015, Seoul, Korea.
Scalable Tensor Decompositions for Multi-aspect Data Mining. Tamara G. Kolda, Jimeng Sun. Data Mining, 2008. ICDM '08.
Tensor Decompositions and Applications. Tamara G. Kolda, Brett W. Bader. SIAM Review Volume 51 Issue 3, August 2009.

Thanks

License

Apache License.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.idea		.idea
project		project
src		src
.gitignore		.gitignore
.sbtopts		.sbtopts
LICENSE		LICENSE
ReadMe.md		ReadMe.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

project

project

src

src

.gitignore

.gitignore

.sbtopts

.sbtopts

LICENSE

LICENSE

ReadMe.md

ReadMe.md

build.sbt

build.sbt

Repository files navigation

ParaTen

Build and Run

1. Environment

2. Compile and package

3. Usage

4. Example call

5. Example decomposition

References

Thanks

License

About

Releases

Packages

Languages

License

chaomai/ParaTen

Folders and files

Latest commit

History

Repository files navigation

ParaTen

Build and Run

1. Environment

2. Compile and package

3. Usage

4. Example call

5. Example decomposition

References

Thanks

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages