> ## Documentation Index > Fetch the complete documentation index at: https://private-7c7dfe99-mintlify-fbfa8bee.mintlify.site/llms.txt > Use this file to discover all available pages before exploring further. > El motor Hive le permite realizar consultas `SELECT` en una tabla de Hive de HDFS. # Motor de tabla Hive export const CloudNotSupportedBadge = () => { return

No es compatible con ClickHouse Cloud

; }; El motor Hive le permite realizar consultas `SELECT` en una tabla de Hive en HDFS. Actualmente, admite los siguientes formatos de entrada: * Text: solo admite tipos de columna escalares simples, excepto `binary` * ORC: admite tipos de columna escalares simples, excepto `char`; solo admite tipos complejos como `array` * Parquet: admite todos los tipos de columna escalares simples; solo admite tipos complejos como `array`

## Crear una tabla

```sql theme={null} CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] ( name1 [type1] [ALIAS expr1], name2 [type2] [ALIAS expr2], ... ) ENGINE = Hive('thrift://host:port', 'database', 'table') PARTITION BY expr ``` Consulte una descripción detallada de la consulta [CREATE TABLE](/es/reference/statements/create/table). La estructura de la tabla puede diferir de la estructura original de la tabla de Hive: * Los nombres de las columnas deben ser los mismos que en la tabla original de Hive, pero puede usar solo algunas de ellas y en cualquier orden; además, puede usar columnas con alias calculadas a partir de otras columnas. * Los tipos de las columnas deben ser los mismos que en la tabla original de Hive. * La expresión de particionado debe ser coherente con la tabla original de Hive, y las columnas usadas en esa expresión deben estar en la estructura de la tabla. **Parámetros del motor** * `thrift://host:port` — Dirección de Hive Metastore * `database` — Nombre de la base de datos remota. * `table` — Nombre de la tabla remota.

## Ejemplo de uso

### Cómo usar la caché local para el sistema de archivos HDFS

Recomendamos encarecidamente habilitar la caché local para los sistemas de archivos remotos. El benchmark muestra que es casi el doble de rápido con caché. Antes de usar la caché, añádala a `config.xml` ```xml theme={null} true local_cache 559096952 1048576 ``` * enable: ClickHouse mantendrá una caché local para el sistema de archivos remoto (HDFS) después del inicio si es true. * root\_dir: Obligatorio. El directorio raíz donde se almacenan los archivos de la caché local del sistema de archivos remoto. * limit\_size: Obligatorio. El tamaño máximo (en bytes) de los archivos de la caché local. * bytes\_read\_before\_flush: Controla la cantidad de bytes antes de hacer flush en el sistema de archivos local al descargar un archivo desde el sistema de archivos remoto. El valor predeterminado es 1 MB.

### Consultar una tabla de Hive con el formato de entrada ORC

#### Crear una tabla en Hive

```text theme={null} hive > CREATE TABLE `test`.`test_orc`( `f_tinyint` tinyint, `f_smallint` smallint, `f_int` int, `f_integer` int, `f_bigint` bigint, `f_float` float, `f_double` double, `f_decimal` decimal(10,0), `f_timestamp` timestamp, `f_date` date, `f_string` string, `f_varchar` varchar(100), `f_bool` boolean, `f_binary` binary, `f_array_int` array, `f_array_string` array, `f_array_float` array, `f_array_array_int` array>, `f_array_array_string` array>, `f_array_array_float` array>) PARTITIONED BY ( `day` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://testcluster/data/hive/test.db/test_orc' OK Time taken: 0.51 seconds hive > insert into test.test_orc partition(day='2021-09-18') select 1, 2, 3, 4, 5, 6.11, 7.22, 8.333, current_timestamp(), current_date(), 'hello world', 'hello world', 'hello world', true, 'hello world', array(1, 2, 3), array('hello world', 'hello world'), array(float(1.1), float(1.2)), array(array(1, 2), array(3, 4)), array(array('a', 'b'), array('c', 'd')), array(array(float(1.11), float(2.22)), array(float(3.33), float(4.44))); OK Time taken: 36.025 seconds hive > select * from test.test_orc; OK 1 2 3 4 5 6.11 7.22 8 2021-11-05 12:38:16.314 2021-11-05 hello world hello world hello world true hello world [1,2,3] ["hello world","hello world"] [1.1,1.2] [[1,2],[3,4]] [["a","b"],["c","d"]] [[1.11,2.22],[3.33,4.44]] 2021-09-18 Time taken: 0.295 seconds, Fetched: 1 row(s) ```

#### Crear una tabla en ClickHouse

Tabla en ClickHouse que recupera datos de la tabla de Hive creada anteriormente: ```sql theme={null} CREATE TABLE test.test_orc ( `f_tinyint` Int8, `f_smallint` Int16, `f_int` Int32, `f_integer` Int32, `f_bigint` Int64, `f_float` Float32, `f_double` Float64, `f_decimal` Float64, `f_timestamp` DateTime, `f_date` Date, `f_string` String, `f_varchar` String, `f_bool` Bool, `f_binary` String, `f_array_int` Array(Int32), `f_array_string` Array(String), `f_array_float` Array(Float32), `f_array_array_int` Array(Array(Int32)), `f_array_array_string` Array(Array(String)), `f_array_array_float` Array(Array(Float32)), `day` String ) ENGINE = Hive('thrift://202.168.117.26:9083', 'test', 'test_orc') PARTITION BY day ``` ```sql theme={null} SELECT * FROM test.test_orc settings input_format_orc_allow_missing_columns = 1\G ``` ```text theme={null} SELECT * FROM test.test_orc SETTINGS input_format_orc_allow_missing_columns = 1 Query id: c3eaffdc-78ab-43cd-96a4-4acc5b480658 Row 1: ────── f_tinyint: 1 f_smallint: 2 f_int: 3 f_integer: 4 f_bigint: 5 f_float: 6.11 f_double: 7.22 f_decimal: 8 f_timestamp: 2021-12-04 04:00:44 f_date: 2021-12-03 f_string: hello world f_varchar: hello world f_bool: true f_binary: hello world f_array_int: [1,2,3] f_array_string: ['hello world','hello world'] f_array_float: [1.1,1.2] f_array_array_int: [[1,2],[3,4]] f_array_array_string: [['a','b'],['c','d']] f_array_array_float: [[1.11,2.22],[3.33,4.44]] day: 2021-09-18 1 rows in set. Elapsed: 0.078 sec. ```

### Consultar una tabla de Hive con el formato de entrada Parquet

#### Crear una tabla en Hive

```text theme={null} hive > CREATE TABLE `test`.`test_parquet`( `f_tinyint` tinyint, `f_smallint` smallint, `f_int` int, `f_integer` int, `f_bigint` bigint, `f_float` float, `f_double` double, `f_decimal` decimal(10,0), `f_timestamp` timestamp, `f_date` date, `f_string` string, `f_varchar` varchar(100), `f_char` char(100), `f_bool` boolean, `f_binary` binary, `f_array_int` array, `f_array_string` array, `f_array_float` array, `f_array_array_int` array>, `f_array_array_string` array>, `f_array_array_float` array>) PARTITIONED BY ( `day` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'hdfs://testcluster/data/hive/test.db/test_parquet' OK Time taken: 0.51 seconds hive > insert into test.test_parquet partition(day='2021-09-18') select 1, 2, 3, 4, 5, 6.11, 7.22, 8.333, current_timestamp(), current_date(), 'hello world', 'hello world', 'hello world', true, 'hello world', array(1, 2, 3), array('hello world', 'hello world'), array(float(1.1), float(1.2)), array(array(1, 2), array(3, 4)), array(array('a', 'b'), array('c', 'd')), array(array(float(1.11), float(2.22)), array(float(3.33), float(4.44))); OK Time taken: 36.025 seconds hive > select * from test.test_parquet; OK 1 2 3 4 5 6.11 7.22 8 2021-12-14 17:54:56.743 2021-12-14 hello world hello world hello world true hello world [1,2,3] ["hello world","hello world"] [1.1,1.2] [[1,2],[3,4]] [["a","b"],["c","d"]] [[1.11,2.22],[3.33,4.44]] 2021-09-18 Time taken: 0.766 seconds, Fetched: 1 row(s) ```

#### Crear una tabla en ClickHouse

Tabla en ClickHouse que recupera datos de la tabla de Hive creada anteriormente: ```sql theme={null} CREATE TABLE test.test_parquet ( `f_tinyint` Int8, `f_smallint` Int16, `f_int` Int32, `f_integer` Int32, `f_bigint` Int64, `f_float` Float32, `f_double` Float64, `f_decimal` Float64, `f_timestamp` DateTime, `f_date` Date, `f_string` String, `f_varchar` String, `f_char` String, `f_bool` Bool, `f_binary` String, `f_array_int` Array(Int32), `f_array_string` Array(String), `f_array_float` Array(Float32), `f_array_array_int` Array(Array(Int32)), `f_array_array_string` Array(Array(String)), `f_array_array_float` Array(Array(Float32)), `day` String ) ENGINE = Hive('thrift://localhost:9083', 'test', 'test_parquet') PARTITION BY day ``` ```sql theme={null} SELECT * FROM test.test_parquet settings input_format_parquet_allow_missing_columns = 1\G ``` ```text theme={null} SELECT * FROM test_parquet SETTINGS input_format_parquet_allow_missing_columns = 1 Query id: 4e35cf02-c7b2-430d-9b81-16f438e5fca9 Row 1: ────── f_tinyint: 1 f_smallint: 2 f_int: 3 f_integer: 4 f_bigint: 5 f_float: 6.11 f_double: 7.22 f_decimal: 8 f_timestamp: 2021-12-14 17:54:56 f_date: 2021-12-14 f_string: hello world f_varchar: hello world f_char: hello world f_bool: true f_binary: hello world f_array_int: [1,2,3] f_array_string: ['hello world','hello world'] f_array_float: [1.1,1.2] f_array_array_int: [[1,2],[3,4]] f_array_array_string: [['a','b'],['c','d']] f_array_array_float: [[1.11,2.22],[3.33,4.44]] day: 2021-09-18 1 rows in set. Elapsed: 0.357 sec. ```

### Consultar una tabla de Hive con el formato de entrada Text

#### Crear una tabla en Hive

```text theme={null} hive > CREATE TABLE `test`.`test_text`( `f_tinyint` tinyint, `f_smallint` smallint, `f_int` int, `f_integer` int, `f_bigint` bigint, `f_float` float, `f_double` double, `f_decimal` decimal(10,0), `f_timestamp` timestamp, `f_date` date, `f_string` string, `f_varchar` varchar(100), `f_char` char(100), `f_bool` boolean, `f_binary` binary, `f_array_int` array, `f_array_string` array, `f_array_float` array, `f_array_array_int` array>, `f_array_array_string` array>, `f_array_array_float` array>) PARTITIONED BY ( `day` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://testcluster/data/hive/test.db/test_text' Time taken: 0.1 seconds, Fetched: 34 row(s) hive > insert into test.test_text partition(day='2021-09-18') select 1, 2, 3, 4, 5, 6.11, 7.22, 8.333, current_timestamp(), current_date(), 'hello world', 'hello world', 'hello world', true, 'hello world', array(1, 2, 3), array('hello world', 'hello world'), array(float(1.1), float(1.2)), array(array(1, 2), array(3, 4)), array(array('a', 'b'), array('c', 'd')), array(array(float(1.11), float(2.22)), array(float(3.33), float(4.44))); OK Time taken: 36.025 seconds hive > select * from test.test_text; OK 1 2 3 4 5 6.11 7.22 8 2021-12-14 18:11:17.239 2021-12-14 hello world hello world hello world true hello world [1,2,3] ["hello world","hello world"] [1.1,1.2] [[1,2],[3,4]] [["a","b"],["c","d"]] [[1.11,2.22],[3.33,4.44]] 2021-09-18 Time taken: 0.624 seconds, Fetched: 1 row(s) ```

#### Crear una tabla en ClickHouse

Tabla en ClickHouse para recuperar datos de la tabla de Hive creada anteriormente: ```sql theme={null} CREATE TABLE test.test_text ( `f_tinyint` Int8, `f_smallint` Int16, `f_int` Int32, `f_integer` Int32, `f_bigint` Int64, `f_float` Float32, `f_double` Float64, `f_decimal` Float64, `f_timestamp` DateTime, `f_date` Date, `f_string` String, `f_varchar` String, `f_char` String, `f_bool` Bool, `day` String ) ENGINE = Hive('thrift://localhost:9083', 'test', 'test_text') PARTITION BY day ``` ```sql theme={null} SELECT * FROM test.test_text settings input_format_skip_unknown_fields = 1, input_format_with_names_use_header = 1, date_time_input_format = 'best_effort'\G ``` ```text theme={null} SELECT * FROM test.test_text SETTINGS input_format_skip_unknown_fields = 1, input_format_with_names_use_header = 1, date_time_input_format = 'best_effort' Query id: 55b79d35-56de-45b9-8be6-57282fbf1f44 Row 1: ────── f_tinyint: 1 f_smallint: 2 f_int: 3 f_integer: 4 f_bigint: 5 f_float: 6.11 f_double: 7.22 f_decimal: 8 f_timestamp: 2021-12-14 18:11:17 f_date: 2021-12-14 f_string: hello world f_varchar: hello world f_char: hello world f_bool: true day: 2021-09-18 ```