最新文章專題視頻專題問(wèn)答1問(wèn)答10問(wèn)答100問(wèn)答1000問(wèn)答2000關(guān)鍵字專題1關(guān)鍵字專題50關(guān)鍵字專題500關(guān)鍵字專題1500TAG最新視頻文章推薦1 推薦3 推薦5 推薦7 推薦9 推薦11 推薦13 推薦15 推薦17 推薦19 推薦21 推薦23 推薦25 推薦27 推薦29 推薦31 推薦33 推薦35 推薦37視頻文章20視頻文章30視頻文章40視頻文章50視頻文章60 視頻文章70視頻文章80視頻文章90視頻文章100視頻文章120視頻文章140 視頻2關(guān)鍵字專題關(guān)鍵字專題tag2tag3文章專題文章專題2文章索引1文章索引2文章索引3文章索引4文章索引5123456789101112131415文章專題3
問(wèn)答文章1 問(wèn)答文章501 問(wèn)答文章1001 問(wèn)答文章1501 問(wèn)答文章2001 問(wèn)答文章2501 問(wèn)答文章3001 問(wèn)答文章3501 問(wèn)答文章4001 問(wèn)答文章4501 問(wèn)答文章5001 問(wèn)答文章5501 問(wèn)答文章6001 問(wèn)答文章6501 問(wèn)答文章7001 問(wèn)答文章7501 問(wèn)答文章8001 問(wèn)答文章8501 問(wèn)答文章9001 問(wèn)答文章9501
當(dāng)前位置: 首頁(yè) - 科技 - 知識(shí)百科 - 正文

大數(shù)據(jù)工程人員知識(shí)圖譜

來(lái)源:懂視網(wǎng) 責(zé)編:小采 時(shí)間:2020-11-09 13:17:51
文檔

大數(shù)據(jù)工程人員知識(shí)圖譜

大數(shù)據(jù)工程人員知識(shí)圖譜:在企業(yè)里面從事大數(shù)據(jù)相關(guān)的工作到底需要掌握哪些知識(shí)呢?我認(rèn)為需要從兩個(gè)角度來(lái)看:一個(gè)是技術(shù);一個(gè)是業(yè)務(wù)。技術(shù)上主要涉及到概率和數(shù)理統(tǒng)計(jì),計(jì)算機(jī)系統(tǒng)、算法和編程等;而業(yè)務(wù)的角度呢則是因公司業(yè)務(wù)的不同而異。對(duì)于從事大數(shù)據(jù)的工程人員來(lái)說(shuō),需要學(xué)
推薦度:
導(dǎo)讀大數(shù)據(jù)工程人員知識(shí)圖譜:在企業(yè)里面從事大數(shù)據(jù)相關(guān)的工作到底需要掌握哪些知識(shí)呢?我認(rèn)為需要從兩個(gè)角度來(lái)看:一個(gè)是技術(shù);一個(gè)是業(yè)務(wù)。技術(shù)上主要涉及到概率和數(shù)理統(tǒng)計(jì),計(jì)算機(jī)系統(tǒng)、算法和編程等;而業(yè)務(wù)的角度呢則是因公司業(yè)務(wù)的不同而異。對(duì)于從事大數(shù)據(jù)的工程人員來(lái)說(shuō),需要學(xué)

在企業(yè)里面從事大數(shù)據(jù)相關(guān)的工作到底需要掌握哪些知識(shí)呢?我認(rèn)為需要從兩個(gè)角度來(lái)看:一個(gè)是技術(shù);一個(gè)是業(yè)務(wù)。技術(shù)上主要涉及到概率和數(shù)理統(tǒng)計(jì),計(jì)算機(jī)系統(tǒng)、算法和編程等;而業(yè)務(wù)的角度呢則是因公司業(yè)務(wù)的不同而異。對(duì)于從事大數(shù)據(jù)的工程人員來(lái)說(shuō),需要學(xué)

在企業(yè)里面從事大數(shù)據(jù)相關(guān)的工作到底需要掌握哪些知識(shí)呢?我認(rèn)為需要從兩個(gè)角度來(lái)看:一個(gè)是技術(shù);一個(gè)是業(yè)務(wù)。技術(shù)上主要涉及到概率和數(shù)理統(tǒng)計(jì),計(jì)算機(jī)系統(tǒng)、算法和編程等;而業(yè)務(wù)的角度呢則是因公司業(yè)務(wù)的不同而異。對(duì)于從事大數(shù)據(jù)的工程人員來(lái)說(shuō),需要學(xué)會(huì)使用數(shù)據(jù)挖掘方法在計(jì)算機(jī)系統(tǒng)和編程工具的幫助下解決實(shí)際的問(wèn)題,這樣才能夠在海量數(shù)據(jù)中挖掘出業(yè)務(wù)增長(zhǎng)的助推劑,才能在激烈的市場(chǎng)競(jìng)爭(zhēng)中為企業(yè)創(chuàng)造更多的價(jià)值。

因?yàn)闃I(yè)務(wù)會(huì)因公司的不同而不同,但是技術(shù)點(diǎn)是想通的。我在這里簡(jiǎn)單總結(jié)了一下大數(shù)據(jù)相關(guān)工程人員需要掌握的技術(shù)相關(guān)知識(shí)點(diǎn)。主要涉及到數(shù)據(jù)庫(kù)、數(shù)據(jù)倉(cāng)庫(kù)、編程、分布式系統(tǒng)、Hadoop生態(tài)系統(tǒng)相關(guān)、數(shù)據(jù)挖掘和機(jī)器學(xué)習(xí)相關(guān)的基礎(chǔ)知識(shí)點(diǎn)。當(dāng)然我這里列出來(lái)的應(yīng)該是一個(gè)team的人員匯集在一起所具備的,每個(gè)人會(huì)因在團(tuán)隊(duì)中的角色不同而有所側(cè)重。在此剖磚引玉,歡迎大家發(fā)表意見(jiàn)。

Topic Content Key points Reference
DB/OLTP & DW/OLAP Database/OLTP basic The relational model, SQL, index/secondary index, inner join/left join/right join/full join, transaction/ACID Ramakrishnan, Raghu, and Johannes Gehrke. Database Management Systems.
Database internal & implementation Architecture, memory management, storage/B+ tree, query parse /optimization/execution, hash join/sort-merge join
Distributed and parallel database Sharding, database proxy
Data warehouse/OLAP Materialized views, ETL, column-oriented storage, reporting, BI tools
Basic programming Programming language Java, Python (Pandas/NumPy/SciPy/scikit-learn), SQL, Functional programming, R/SAS/SPSS Wes McKinney. Python for Data Analysis: Agile Tools for Real World Data.
OS Linux
DB & DW system MySQL/ Hive/Impala
Text format and process JSON/XML, regex
Tool Git/SVN, Maven
Distributed system & Hadoop ecosystem & NoSQL Distributed system principal theory CAP theorem, RPC (Protocol Buffer/Thrift/Avro), Zookeeper, Metadata management (HCatalog)
Distributed storage & computing framework & resource management Hadoop/HDFS/MapReduce/YARN Tom White. Hadoop : The Definitive Guide.

Donald Miner, Adam Shook. MapReduce Design Patterns : Building Effective Algorithm and Analytics for Hadoop and Other Systems.

SQL on Hadoop Data (log) acquisition/integration/fusion, normalization, feature extraction Sqoop, Flume/Scribe/Chukwa,SerDe Edward Capriolo, Dean Wampler, Jason Rutherglen. Programming Hive.
Query & In-database analytics Hive, Impala, UDF/UDAF
Large scale data mining & machine learning framework Spark/MLbase, MR/Mahout
Streaming process Storm
NoSQL HBase/Cassandra (column oriented database) Lars George. HBase: The Definitive Guide.
Mongodb (Document database)
Neo4j (graph database)
Redis (cache)
Data mining & Machine learning DM & ML basic Numerical/Categorical variable, training/test data, over fitting, bias/variance, precision/recall, tagging
Statistic Data exploration (mean, median/range/standard deviation/variance/histogram), Continues distributions (Normal/ Poisson/Gaussian), covariance, correlation coefficient, distance and similarity computing, Bayes theorem, Monte Carlo Method, Hypothesis testing
Supervised learning Classifier, boosting, prediction, regression analysis

Han, Jiawei,Micheline Kamber, and Jian Pei.?Data mining: concepts and techniques.

Unsupervised learning Cluster, deep learning
Collaborative filtering

Item based CF, user based CF

Algorithm Classifier Decision trees, KNN (K-Nearest neighbor), SVM (support vector machines), SVD (Singular Value Decomposition), na?ve Bayes classifiers, neural networks,
Regression Linear regression, logistic regression, ranking, perception
Cluster Hierarchical cluster, K-means cluster, Spectral Cluster
Dimensionality reduction PCA (Principal Component Analysis), LDA (Linear discriminant Analysis), MDS (Multidimensional scaling)
Text mining & Information retrieval Corpus, term document matrix, term frequency & weight, association rules, market based analysis, vocabulary mapping, sentiment analysis, tagging, PageRank, VSM (Vector Space Model), inverted index Jimmy Lin and Chris Dyer. Data-Intensive Text Processing with MapReduce.

聲明:本網(wǎng)頁(yè)內(nèi)容旨在傳播知識(shí),若有侵權(quán)等問(wèn)題請(qǐng)及時(shí)與本網(wǎng)聯(lián)系,我們將在第一時(shí)間刪除處理。TEL:177 7030 7066 E-MAIL:11247931@qq.com

文檔

大數(shù)據(jù)工程人員知識(shí)圖譜

大數(shù)據(jù)工程人員知識(shí)圖譜:在企業(yè)里面從事大數(shù)據(jù)相關(guān)的工作到底需要掌握哪些知識(shí)呢?我認(rèn)為需要從兩個(gè)角度來(lái)看:一個(gè)是技術(shù);一個(gè)是業(yè)務(wù)。技術(shù)上主要涉及到概率和數(shù)理統(tǒng)計(jì),計(jì)算機(jī)系統(tǒng)、算法和編程等;而業(yè)務(wù)的角度呢則是因公司業(yè)務(wù)的不同而異。對(duì)于從事大數(shù)據(jù)的工程人員來(lái)說(shuō),需要學(xué)
推薦度:
  • 熱門焦點(diǎn)

最新推薦

猜你喜歡

熱門推薦

專題
Top