最新文章專題視頻專題問(wèn)答1問(wèn)答10問(wèn)答100問(wèn)答1000問(wèn)答2000關(guān)鍵字專題1關(guān)鍵字專題50關(guān)鍵字專題500關(guān)鍵字專題1500TAG最新視頻文章推薦1 推薦3 推薦5 推薦7 推薦9 推薦11 推薦13 推薦15 推薦17 推薦19 推薦21 推薦23 推薦25 推薦27 推薦29 推薦31 推薦33 推薦35 推薦37視頻文章20視頻文章30視頻文章40視頻文章50視頻文章60 視頻文章70視頻文章80視頻文章90視頻文章100視頻文章120視頻文章140 視頻2關(guān)鍵字專題關(guān)鍵字專題tag2tag3文章專題文章專題2文章索引1文章索引2文章索引3文章索引4文章索引5123456789101112131415文章專題3
問(wèn)答文章1 問(wèn)答文章501 問(wèn)答文章1001 問(wèn)答文章1501 問(wèn)答文章2001 問(wèn)答文章2501 問(wèn)答文章3001 問(wèn)答文章3501 問(wèn)答文章4001 問(wèn)答文章4501 問(wèn)答文章5001 問(wèn)答文章5501 問(wèn)答文章6001 問(wèn)答文章6501 問(wèn)答文章7001 問(wèn)答文章7501 問(wèn)答文章8001 問(wèn)答文章8501 問(wèn)答文章9001 問(wèn)答文章9501
當(dāng)前位置: 首頁(yè) - 科技 - 知識(shí)百科 - 正文

FiltersinHBase(orintrarowscanningpartII)

來(lái)源:懂視網(wǎng) 責(zé)編:小采 時(shí)間:2020-11-09 13:32:19
文檔

FiltersinHBase(orintrarowscanningpartII)

FiltersinHBase(orintrarowscanningpartII):Filters in HBase are a somewhat obscure and under-documented feature. (Even us committers are often not aware of their usefulness - see HBASE-5229, and HBASE-4256... Or maybe it's just me...). Intras row scanning can be done using ColumnRa
推薦度:
導(dǎo)讀FiltersinHBase(orintrarowscanningpartII):Filters in HBase are a somewhat obscure and under-documented feature. (Even us committers are often not aware of their usefulness - see HBASE-5229, and HBASE-4256... Or maybe it's just me...). Intras row scanning can be done using ColumnRa

Filters in HBase are a somewhat obscure and under-documented feature. (Even us committers are often not aware of their usefulness - see HBASE-5229, and HBASE-4256... Or maybe it's just me...). Intras row scanning can be done using ColumnRa

Filters in HBase are a somewhat obscure and under-documented feature. (Even us committers are often not aware of their usefulness - see HBASE-5229, and HBASE-4256... Or maybe it's just me...).

Intras row scanning can be done using ColumnRangeFilter. Other filters such as ColumnPrefixFilter or MultipleColumnPrefixFilter might also be handy for this. All three filters have in common that they can provide scanners (see scanning in hbase) with what I will call "seek hints". These hints allow a scanner to seek to the next column, the next row, or an arbitrary next cell determined by the filter. This is far more efficient than having a dumb filter that is passed each cell and determines whether the cell is included in the result or not.

Many other filters also provide these "seek hints". The exception here are filters that filter on column values, as there is no inherent ordering between column values; these filters need to look at the value for each column.

For example check out this code in MultipleColumnPrefixFilter (ASF 2.0 license):
TreeSet lesserOrEqualPrefixes =
(TreeSet) sortedPrefixes.headSet(qualifier, true);
if (lesserOrEqualPrefixes.size() != 0) {
byte [] largestPrefixSmallerThanQualifier = lesserOrEqualPrefixes.last();
if (Bytes.startsWith(qualifier, largestPrefixSmallerThanQualifier)) {
return ReturnCode.INCLUDE;
}
if (lesserOrEqualPrefixes.size() == sortedPrefixes.size()) {
return ReturnCode.NEXT_ROW;
} else {
hint = sortedPrefixes.higher(largestPrefixSmallerThanQualifier);
return ReturnCode.SEEK_NEXT_USING_HINT;
}
} else {
hint = sortedPrefixes.first();
return ReturnCode.SEEK_NEXT_USING_HINT;
}
(the is used later to skip ahead to that column prefix)

See how this code snippet allows the filter to

  1. seek to the next row if all prefixes are know to be less or equal the current qualifier (and the largest didn't match the passed column qualifier). Note that a single seek to the next row can potentially skip millions of columns with a single seek operation.
  2. seek to the next larger prefix if there are more prefixes, but the current does not match the qualifier.
  3. seek to the first prefix (the smallest) if none the prefixes are less or equal to the current qualifier.
If you didn't feel like looking at the code, you can take away from this that these filters can be safely and efficiently used in very wide rows. If the filter instead would indicate only INCLUDE or SKIP and be forced to visit/examine every version of every column of every row, it would be inefficient to use for wide rows with hundreds of thousands or millions of columns.

I'm in the process of adding more information for these Filter to the HBase Book Reference Guide.

聲明:本網(wǎng)頁(yè)內(nèi)容旨在傳播知識(shí),若有侵權(quán)等問(wèn)題請(qǐng)及時(shí)與本網(wǎng)聯(lián)系,我們將在第一時(shí)間刪除處理。TEL:177 7030 7066 E-MAIL:11247931@qq.com

文檔

FiltersinHBase(orintrarowscanningpartII)

FiltersinHBase(orintrarowscanningpartII):Filters in HBase are a somewhat obscure and under-documented feature. (Even us committers are often not aware of their usefulness - see HBASE-5229, and HBASE-4256... Or maybe it's just me...). Intras row scanning can be done using ColumnRa
推薦度:
標(biāo)簽: 掃描 sin or
  • 熱門焦點(diǎn)

最新推薦

猜你喜歡

熱門推薦

專題
Top