The NIPSEXT command-line tool

A command-line tool is provided with PyNIPS for getting information about NIPS data files as well as extracting records.

For general usage run nipsext –help:

$ nipsext --help
usage: nipsext [-h] {info,rec,repl} ...

NIPS EXtraction Tool

options:
-h, --help       show this help message and exit

commands:
{info,rec,repl}
    info           Output information about the NIPS file.
    rec            Output records in the recfiles file format. This is a format usable by the GNU Recutils tools.
    repl           Load the NIPS file and start an interactive REPL.

Danger's over, Banana Breakfast is saved.

Detailed help is available for individual subcommands by providing the –help argument (e.g. nipsext info –help).

Information about NIPS files

Use the nipsext info command:

$ nipsext info HES.HAMLA.67.NIPS
# HES.HAMLA.67.NIPS

## Classification Record

Classification: XXXXXXXXXXXX

## Data File Control Record

Number of Periodic Sets: 1

### Fixed Set

|Element Name |Group |Control |System |Restricted |Location |Length |Mode |
|-------------|------|--------|-------|-----------|---------|-------|-----|
|+FIL         |False |True    |True   |True       |       5 |     1 |A    |
|+RCN         |False |True    |True   |True       |       6 |    13 |A    |
|CHAM         |False |True    |False  |False      |       6 |     1 |A    |
|HAMID        |True  |True    |False  |False      |       6 |     9 |A    |
|RECID        |True  |True    |False  |False      |       6 |    13 |A    |
|PHAM         |False |True    |False  |False      |       7 |     2 |A    |
|DHAM         |False |True    |False  |False      |       9 |     2 |A    |
|VHAM         |False |True    |False  |False      |      11 |     2 |A    |
|HHAM         |False |True    |False  |False      |      13 |     2 |A    |
|DATE         |False |True    |False  |False      |      15 |     4 |A    |
|+PCN         |False |True    |True   |True       |      19 |     1 |A    |
|+SC(0)       |False |True    |True   |True       |      20 |     4 |A    |
|+BSZ         |False |False   |True   |True       |      24 |     1 |A    |
|VSZ          |False |False   |True   |False      |      28 |     4 |B    |
|POPUL        |False |False   |False  |False      |      32 |     5 |B    |
|SECUR        |False |False   |False  |False      |      36 |     3 |B    |
|DEVEL        |False |False   |False  |False      |      40 |     3 |B    |
|CLASX        |False |False   |False  |False      |      44 |     3 |B    |
|CONFX        |False |False   |False  |False      |      48 |     3 |B    |
|RECTP        |False |False   |False  |False      |      52 |     1 |A    |
|VALID        |False |False   |False  |False      |      53 |     1 |A    |
|NAME         |False |False   |False  |False      |      54 |    16 |A    |
|XNAME        |False |False   |False  |False      |      70 |    14 |A    |
|NUMB         |False |False   |False  |False      |      84 |     3 |A    |
|POINT        |False |False   |False  |False      |      87 |     8 |A    |
|NPA          |False |False   |False  |False      |      95 |     1 |A    |
|CNTL6        |False |False   |False  |False      |      96 |     2 |A    |
|HTYPE        |False |False   |False  |False      |      98 |     3 |A    |
|CNTL7        |False |False   |False  |False      |     101 |     2 |A    |
|RDSTA        |False |False   |False  |False      |     103 |     2 |A    |
|CLAS         |False |False   |False  |False      |     105 |     1 |A    |
|POPRE        |False |False   |False  |False      |     106 |     1 |A    |
|EVALU        |True  |False   |False  |False      |     107 |    24 |A    |
|MLAC         |False |False   |False  |False      |     107 |     4 |A    |
|PLAC         |False |False   |False  |False      |     111 |     4 |A    |
|SECU         |False |False   |False  |False      |     115 |     4 |A    |
|ADPL         |False |False   |False  |False      |     119 |     4 |A    |
|HEW          |False |False   |False  |False      |     123 |     4 |A    |
|ECDV         |False |False   |False  |False      |     127 |     4 |A    |
|SCSTA        |False |False   |False  |False      |     131 |     2 |A    |
|PROB1        |False |False   |False  |False      |     133 |     5 |A    |
|URBAN        |False |False   |False  |False      |     138 |     1 |A    |
|PROB3        |False |False   |False  |False      |     139 |     1 |A    |
|ELECT        |False |False   |False  |False      |     140 |     3 |A    |
|PROB5        |False |False   |False  |False      |     143 |     1 |A    |
|PROB6        |False |False   |False  |False      |     144 |     1 |A    |
|PROB7        |False |False   |False  |False      |     145 |     1 |A    |
|PROB8        |False |False   |False  |False      |     146 |     1 |A    |
|VISIT        |False |False   |False  |False      |     147 |     1 |A    |
|XPROB        |False |False   |False  |False      |     148 |    19 |A    |
|VCMR         |False |False   |False  |False      |     167 |     2 |A    |
|TEAMS        |False |False   |False  |False      |     169 |     8 |A    |

### Periodic Set 1

|Element Name |Group |Control |System |Restricted |Location |Length |Mode |
|-------------|------|--------|-------|-----------|---------|-------|-----|
|+SC(1)       |False |True    |True   |True       |      20 |     4 |A    |
|PSSQ1        |False |True    |True   |False      |      20 |     4 |A    |
|VSZ1         |False |False   |True   |False      |      28 |     4 |B    |
|POPU         |True  |False   |False  |False      |      32 |    40 |D    |
|VCPOP        |False |False   |False  |False      |      32 |     8 |D    |
|GVPOP        |False |False   |False  |False      |      40 |     8 |D    |
|NBPOP        |False |False   |False  |False      |      48 |     8 |D    |
|UKPOP        |False |False   |False  |False      |      56 |     8 |D    |
|TOPOP        |False |False   |False  |False      |      64 |     8 |D    |

## Record Count

|Record Type |Count   |
|------------|--------|
|B           |       1|
|C           |       1|
|F           |      61|
|L           |       2|
|R           |  154803|
|------------|--------|
|TOTAL       |  154868|

## Data File Record Count

|Set             |Count   |
|----------------|--------|
|Fixed Set       |  151814|
|Periodic Set  0 |    2989|
|----------------|--------|
|TOTAL           |  154803|

Extract records

The rec command can be used to extract data file recods in the `GNU Recfile <https://www.gnu.org/software/recutils/manual/html_node/The-Rec-Format.html#The-Rec-Format> format.

$ nipsext rec HES.HAMLA.67.NIPS
# -*- mode: rec -*-
# HES.HAMLA.67.NIPS

Record_Control_Group: 1010100006701
Set_ID: 0
%FIL: R
%RCN: 1010100006701
CHAM: 1
HAMID: 101010000
RECID: 1010100006701
PHAM: 01
DHAM: 01
VHAM: 00
HHAM: 00
DATE: 6701
%PCN: 0
%SC_0:
%BSZ: 6
VSZ: 0
POPUL: 0
SECUR: 0
DEVEL: 0
CLASX: 0
CONFX: 0
RECTP: 0
VALID:
NAME:
XNAME:
NUMB: 000
POINT: 00000000
NPA: 0
CNTL6: 00
HTYPE: 000
CNTL7: 00
RDSTA: 00
CLAS: 0
POPRE: 0
EVALU: 000000000000000000000000
MLAC: 0000
PLAC: 0000
SECU: 0000
ADPL: 0000
HEW: 0000
ECDV: 0000
SCSTA: 00
PROB1: NN
URBAN: 0
PROB3: 0
ELECT: 000
PROB5: N
PROB6: N
PROB7: N
PROB8: N
VISIT:
XPROB:
VCMR: 05
TEAMS: 00000000

Record_Control_Group: 1010100006701
Set_ID: 1
%SC_1: 5000
PSSQ1: 5000
VSZ1: 0
POPU: 0000000000000000000000000000000000000000
VCPOP: 00000000
GVPOP: 00000000
NBPOP: 00000000
UKPOP: 00000000
TOPOP: 00000000

See nipsext rec –help for additional arguments and options.

As CSV

Using the GNU recutils <https://www.gnu.org/software/recutils/> tools we can transform the output to CSV:

$ nipsext rec HES.HAMLA.67.NIPS -l 20 -i 1 | rec2csv
"Record_Control_Group","Set_ID","%SC_1","PSSQ1","VSZ1","POPU","VCPOP","GVPOP","NBPOP","UKPOP","TOPOP"
"1010100006701","1","5000","5000","0","0000000000000000000000000000000000000000","00000000","00000000","00000000","00000000","00000000"
"1010100006702","1","5000","5000","0","0000000000000000000000000000000000000000","00000000","00000000","00000000","00000000","00000000"
"1010100006703","1","5000","5000","0","0000000000000000000000000000000000000000","00000000","00000000","00000000","00000000","00000000"
"1010100006704","1","5000","5000","0","0000000000000000000000000000000000000000","00000000","00000000","00000000","00000000","00000000"
"1010100006705","1","5000","5000","0","0000000000000000000000000000000000000000","00000000","00000000","00000000","00000000","00000000"
"1010100006706","1","5000","5000","0","0000000000000000000000000000000000000000","00000000","00000000","00000000","00000000","00000000"
"1010100006707","1","5000","5000","0","0000000000000000000000000000000000000000","00000000","00000000","00000000","00000000","00000000"
"1010100006708","1","5000","5000","0","0000000000000000000000000000000000000000","00000000","00000000","00000000","00000000","00000000"
"1010100006709","1","5000","5000","0","0000000000000000000000000000000000000000","00000000","00000000","00000000","00000000","00000000"
"1010100006710","1","5000","5000","0","0000000000000000000000000000000000000000","00000000","00000000","00000000","00000000","00000000"
"1010100006711","1","5000","5000","0","0000000000000000000000000000000000000000","00000000","00000000","00000000","00000000","00000000"
"1010100006712","1","5000","5000","0","0000000000000000000000000000000000000000","00000000","00000000","00000000","00000000","00000000"
"1010200006701","1","5000","5000","0","0000000000000000000000000000000000000000","00000000","00000000","00000000","00000000","00000000"
"1010200006702","1","5000","5000","0","0000000000000000000000000000000000000000","00000000","00000000","00000000","00000000","00000000"
"1010200006703","1","5000","5000","0","0000000000000000000000000000000000000000","00000000","00000000","00000000","00000000","00000000"
"1010200006704","1","5000","5000","0","0000000000000000000000000000000000000000","00000000","00000000","00000000","00000000","00000000"
"1010200006705","1","5000","5000","0","0000000000000364000000000000000000000364","00000000","00000364","00000000","00000000","00000364"
"1010200006706","1","5000","5000","0","0000000000000364000000000000000000000364","00000000","00000364","00000000","00000000","00000364"
"1010200006707","1","5000","5000","0","0000000000000364000000000000000000000364","00000000","00000364","00000000","00000000","00000364"
"1010200006708","1","5000","5000","0","0000000000000364000000000000000000000364","00000000","00000364","00000000","00000000","00000364"

To an SQLite database

Records can also be exported to an ‘SQLite <https://sqlite.org/>’ database:

$ nipsext sqlite HES.HAMLA.67.NIPS --db HES.HAMLA.67.sqlite

A SQL table is created for the fixed set and every periodic set in the data file. The record key is used as primary key.

$ sqlite3 HES.HAMLA.67.sqlite -- .schema

CREATE TABLE FixedSet ( KEY TEXT PRIMARY KEY, RECORD_CONTROL_GROUP TEXT, SUBSET_CONTROL_GROUP TEXT , "+FIL" TEXT, "+RCN" TEXT, "CHAM" TEXT, "HAMID" TEXT, "RECID" TEXT, "PHAM" TEXT, "DHAM" TEXT, "VHAM" TEXT, "HHAM" TEXT, "DATE" TEXT, "+PCN" TEXT, "+SC(0)" TEXT, "+BSZ" TEXT, "VSZ" TEXT, "POPUL" TEXT, "SECUR" TEXT, "DEVEL" TEXT, "CLASX" TEXT, "CONFX" TEXT, "RECTP" TEXT, "VALID" TEXT, "NAME" TEXT, "XNAME" TEXT, "NUMB" TEXT, "POINT" TEXT, "NPA" TEXT, "CNTL6" TEXT, "HTYPE" TEXT, "CNTL7" TEXT, "RDSTA" TEXT, "CLAS" TEXT, "POPRE" TEXT, "EVALU" TEXT, "MLAC" TEXT, "PLAC" TEXT, "SECU" TEXT, "ADPL" TEXT, "HEW" TEXT, "ECDV" TEXT, "SCSTA" TEXT, "PROB1" TEXT, "URBAN" TEXT, "PROB3" TEXT, "ELECT" TEXT, "PROB5" TEXT, "PROB6" TEXT, "PROB7" TEXT, "PROB8" TEXT, "VISIT" TEXT, "XPROB" TEXT, "VCMR" TEXT, "TEAMS" TEXT);
CREATE TABLE PeriodicSet1 ( KEY TEXT PRIMARY KEY, RECORD_CONTROL_GROUP TEXT, SUBSET_CONTROL_GROUP TEXT , "+SC(1)" TEXT, "PSSQ1" TEXT, "VSZ1" TEXT, "POPU" TEXT, "VCPOP" TEXT, "GVPOP" TEXT, "NBPOP" TEXT, "UKPOP" TEXT, "TOPOP" TEXT);

This allows records to be queried using SQL statements. In particular it allows joining on fields such as the RECORD_CONTROL_GROUP:

SELECT * FROM FixedSet, PeriodicSet1 USING(RECORD_CONTROL_GROUP)

REPL

For interactive exploration of the structure and records in a NIPS data file you may use the NIPSEXT REPL (Read-Evaluate-Print-Loop):

$ nipsext repl HES.HAMLA.67.NIPS
Starting the NIPSEXT REPL.
NIPS data set has been loaded to the variable "data_set". Type "help(data_set)" for more information.
>>> data_set
<nips.data_set.DataSet object at 0x7f130864ab60>
>>> len(data_set.fixed_set)
151814
>>> data_set.fixed_set[0]
DataFileRecord(offset=5602, length=184, os_control=0, delete_code=b'\x00', type='R', record_control_group='1010100006701', set_id=0, fields={'+FIL': R, '+RCN': 1010100006701, 'CHAM': 1, 'HAMID': 101010000, 'RECID': 1010100006701, 'PHAM': 01, 'DHAM': 01, 'VHAM': 00, 'HHAM': 00, 'DATE': 6701, '+PCN': 0, '+SC(0)': , '+BSZ': 6, 'VSZ': 0, 'POPUL': 0, 'SECUR': 0, 'DEVEL': 0, 'CLASX': 0, 'CONFX': 0, 'RECTP': 0, 'VALID':  , 'NAME':                 , 'XNAME':               , 'NUMB': 000, 'POINT': 00000000, 'NPA': 0, 'CNTL6': 00, 'HTYPE': 000, 'CNTL7': 00, 'RDSTA': 00, 'CLAS': 0, 'POPRE': 0, 'EVALU': 000000000000000000000000, 'MLAC': 0000, 'PLAC': 0000, 'SECU': 0000, 'ADPL': 0000, 'HEW': 0000, 'ECDV': 0000, 'SCSTA': 00, 'PROB1': NN   , 'URBAN': 0, 'PROB3': 0, 'ELECT': 000, 'PROB5': N, 'PROB6': N, 'PROB7': N, 'PROB8': N, 'VISIT':  , 'XPROB':                    , 'VCMR': 05, 'TEAMS': 00000000})
>>> data_set.fixed_set[0]['HAMID']
101010000
>>> data_set.fixed_set[0]['HAMID'].element_format_record
ElementFormatRecord(offset=2152, length=148, os_control=0, delete_code=b'\xf0', type='F', element_name='HAMID', element_set_identification=0, element_type_identification=72, group=True, control=True, system=False, restricted=False, fixed_length_field=True, variable_length_field=False, variable_set_field=False, element_location=6, element_length=9, element_mode_specification='A', input_subroutine_conversion_name='', output_subroutine_conversion_name='', element_label_location=0, element_label_length=0, edit_mask_location=0, edit_mask_length=0, size_on_output=0, field_names_location=0, number_of_fields_making_up_the_group=0)
>>> exit()
Bye.