DiskANN/apps/utils/bin_to_fvecs.cpp

64 строки
2.0 KiB
C++
Исходник Постоянная ссылка Обычный вид История

2020-09-05 10:25:44 +03:00
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT license.
#include <iostream>
#include "util.h"
void block_convert(std::ifstream &writr, std::ofstream &readr, float *read_buf, float *write_buf, uint64_t npts,
uint64_t ndims)
Clang-format now errors on push and PR if formatting is incorrect (#236) * Rather than sift through all the *.cpp and *.h in the root directory, we're looking for only the sources in our main repository for formatting. Git submodules are excluded * Removing the --Werror flag only until we actually format all of the code in a future commit * We're choosing to base our style on the Microsoft style guide and not make any changes * Running format action on source code. Settling on Google styling. Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false') * Enabling error on malformatted file * Revert "Enabling error on malformatted file" This reverts commit fa33e8284cb9ee815d882e516aaeb7be6800a982. * Revert "Running format action on source code. Settling on Google styling. Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false')" This reverts commit e0281bec8c265ecd3b56d65f61e768238ed8b1c1. * Trying again; formatting rules based on Google rules, disables sorting includes as that breaks us, and enabling check on build. * Somehow this was missed in the mass format. Formatting include/distance.h. * Manually fixing the formatting because clang-format wouldn't, but WOULD flag it as invalid
2023-03-17 23:39:48 +03:00
{
writr.write((char *)read_buf, npts * (ndims * sizeof(float) + sizeof(unsigned)));
2020-09-05 10:25:44 +03:00
#pragma omp parallel for
for (uint64_t i = 0; i < npts; i++)
Clang-format now errors on push and PR if formatting is incorrect (#236) * Rather than sift through all the *.cpp and *.h in the root directory, we're looking for only the sources in our main repository for formatting. Git submodules are excluded * Removing the --Werror flag only until we actually format all of the code in a future commit * We're choosing to base our style on the Microsoft style guide and not make any changes * Running format action on source code. Settling on Google styling. Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false') * Enabling error on malformatted file * Revert "Enabling error on malformatted file" This reverts commit fa33e8284cb9ee815d882e516aaeb7be6800a982. * Revert "Running format action on source code. Settling on Google styling. Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false')" This reverts commit e0281bec8c265ecd3b56d65f61e768238ed8b1c1. * Trying again; formatting rules based on Google rules, disables sorting includes as that breaks us, and enabling check on build. * Somehow this was missed in the mass format. Formatting include/distance.h. * Manually fixing the formatting because clang-format wouldn't, but WOULD flag it as invalid
2023-03-17 23:39:48 +03:00
{
memcpy(write_buf + i * ndims, (read_buf + i * (ndims + 1)) + 1, ndims * sizeof(float));
}
readr.read((char *)write_buf, npts * ndims * sizeof(float));
2020-09-05 10:25:44 +03:00
}
Clang-format now errors on push and PR if formatting is incorrect (#236) * Rather than sift through all the *.cpp and *.h in the root directory, we're looking for only the sources in our main repository for formatting. Git submodules are excluded * Removing the --Werror flag only until we actually format all of the code in a future commit * We're choosing to base our style on the Microsoft style guide and not make any changes * Running format action on source code. Settling on Google styling. Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false') * Enabling error on malformatted file * Revert "Enabling error on malformatted file" This reverts commit fa33e8284cb9ee815d882e516aaeb7be6800a982. * Revert "Running format action on source code. Settling on Google styling. Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false')" This reverts commit e0281bec8c265ecd3b56d65f61e768238ed8b1c1. * Trying again; formatting rules based on Google rules, disables sorting includes as that breaks us, and enabling check on build. * Somehow this was missed in the mass format. Formatting include/distance.h. * Manually fixing the formatting because clang-format wouldn't, but WOULD flag it as invalid
2023-03-17 23:39:48 +03:00
int main(int argc, char **argv)
{
if (argc != 3)
{
std::cout << argv[0] << " input_bin output_fvecs" << std::endl;
exit(-1);
}
std::ifstream readr(argv[1], std::ios::binary);
int npts_s32;
int ndims_s32;
readr.read((char *)&npts_s32, sizeof(int32_t));
readr.read((char *)&ndims_s32, sizeof(int32_t));
Clang-format now errors on push and PR if formatting is incorrect (#236) * Rather than sift through all the *.cpp and *.h in the root directory, we're looking for only the sources in our main repository for formatting. Git submodules are excluded * Removing the --Werror flag only until we actually format all of the code in a future commit * We're choosing to base our style on the Microsoft style guide and not make any changes * Running format action on source code. Settling on Google styling. Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false') * Enabling error on malformatted file * Revert "Enabling error on malformatted file" This reverts commit fa33e8284cb9ee815d882e516aaeb7be6800a982. * Revert "Running format action on source code. Settling on Google styling. Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false')" This reverts commit e0281bec8c265ecd3b56d65f61e768238ed8b1c1. * Trying again; formatting rules based on Google rules, disables sorting includes as that breaks us, and enabling check on build. * Somehow this was missed in the mass format. Formatting include/distance.h. * Manually fixing the formatting because clang-format wouldn't, but WOULD flag it as invalid
2023-03-17 23:39:48 +03:00
size_t npts = npts_s32;
size_t ndims = ndims_s32;
uint32_t ndims_u32 = (uint32_t)ndims_s32;
// uint64_t fsize = writr.tellg();
Clang-format now errors on push and PR if formatting is incorrect (#236) * Rather than sift through all the *.cpp and *.h in the root directory, we're looking for only the sources in our main repository for formatting. Git submodules are excluded * Removing the --Werror flag only until we actually format all of the code in a future commit * We're choosing to base our style on the Microsoft style guide and not make any changes * Running format action on source code. Settling on Google styling. Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false') * Enabling error on malformatted file * Revert "Enabling error on malformatted file" This reverts commit fa33e8284cb9ee815d882e516aaeb7be6800a982. * Revert "Running format action on source code. Settling on Google styling. Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false')" This reverts commit e0281bec8c265ecd3b56d65f61e768238ed8b1c1. * Trying again; formatting rules based on Google rules, disables sorting includes as that breaks us, and enabling check on build. * Somehow this was missed in the mass format. Formatting include/distance.h. * Manually fixing the formatting because clang-format wouldn't, but WOULD flag it as invalid
2023-03-17 23:39:48 +03:00
readr.seekg(0, std::ios::beg);
unsigned ndims_u32;
writr.write((char *)&ndims_u32, sizeof(unsigned));
writr.seekg(0, std::ios::beg);
uint64_t ndims = (uint64_t)ndims_u32;
uint64_t npts = fsize / ((ndims + 1) * sizeof(float));
Clang-format now errors on push and PR if formatting is incorrect (#236) * Rather than sift through all the *.cpp and *.h in the root directory, we're looking for only the sources in our main repository for formatting. Git submodules are excluded * Removing the --Werror flag only until we actually format all of the code in a future commit * We're choosing to base our style on the Microsoft style guide and not make any changes * Running format action on source code. Settling on Google styling. Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false') * Enabling error on malformatted file * Revert "Enabling error on malformatted file" This reverts commit fa33e8284cb9ee815d882e516aaeb7be6800a982. * Revert "Running format action on source code. Settling on Google styling. Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false')" This reverts commit e0281bec8c265ecd3b56d65f61e768238ed8b1c1. * Trying again; formatting rules based on Google rules, disables sorting includes as that breaks us, and enabling check on build. * Somehow this was missed in the mass format. Formatting include/distance.h. * Manually fixing the formatting because clang-format wouldn't, but WOULD flag it as invalid
2023-03-17 23:39:48 +03:00
std::cout << "Dataset: #pts = " << npts << ", # dims = " << ndims << std::endl;
uint64_t blk_size = 131072;
uint64_t nblks = ROUND_UP(npts, blk_size) / blk_size;
Clang-format now errors on push and PR if formatting is incorrect (#236) * Rather than sift through all the *.cpp and *.h in the root directory, we're looking for only the sources in our main repository for formatting. Git submodules are excluded * Removing the --Werror flag only until we actually format all of the code in a future commit * We're choosing to base our style on the Microsoft style guide and not make any changes * Running format action on source code. Settling on Google styling. Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false') * Enabling error on malformatted file * Revert "Enabling error on malformatted file" This reverts commit fa33e8284cb9ee815d882e516aaeb7be6800a982. * Revert "Running format action on source code. Settling on Google styling. Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false')" This reverts commit e0281bec8c265ecd3b56d65f61e768238ed8b1c1. * Trying again; formatting rules based on Google rules, disables sorting includes as that breaks us, and enabling check on build. * Somehow this was missed in the mass format. Formatting include/distance.h. * Manually fixing the formatting because clang-format wouldn't, but WOULD flag it as invalid
2023-03-17 23:39:48 +03:00
std::cout << "# blks: " << nblks << std::endl;
std::ofstream writr(argv[2], std::ios::binary);
float *read_buf = new float[npts * (ndims + 1)];
float *write_buf = new float[npts * ndims];
for (uint64_t i = 0; i < nblks; i++)
Clang-format now errors on push and PR if formatting is incorrect (#236) * Rather than sift through all the *.cpp and *.h in the root directory, we're looking for only the sources in our main repository for formatting. Git submodules are excluded * Removing the --Werror flag only until we actually format all of the code in a future commit * We're choosing to base our style on the Microsoft style guide and not make any changes * Running format action on source code. Settling on Google styling. Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false') * Enabling error on malformatted file * Revert "Enabling error on malformatted file" This reverts commit fa33e8284cb9ee815d882e516aaeb7be6800a982. * Revert "Running format action on source code. Settling on Google styling. Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false')" This reverts commit e0281bec8c265ecd3b56d65f61e768238ed8b1c1. * Trying again; formatting rules based on Google rules, disables sorting includes as that breaks us, and enabling check on build. * Somehow this was missed in the mass format. Formatting include/distance.h. * Manually fixing the formatting because clang-format wouldn't, but WOULD flag it as invalid
2023-03-17 23:39:48 +03:00
{
uint64_t cblk_size = std::min(npts - i * blk_size, blk_size);
Clang-format now errors on push and PR if formatting is incorrect (#236) * Rather than sift through all the *.cpp and *.h in the root directory, we're looking for only the sources in our main repository for formatting. Git submodules are excluded * Removing the --Werror flag only until we actually format all of the code in a future commit * We're choosing to base our style on the Microsoft style guide and not make any changes * Running format action on source code. Settling on Google styling. Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false') * Enabling error on malformatted file * Revert "Enabling error on malformatted file" This reverts commit fa33e8284cb9ee815d882e516aaeb7be6800a982. * Revert "Running format action on source code. Settling on Google styling. Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false')" This reverts commit e0281bec8c265ecd3b56d65f61e768238ed8b1c1. * Trying again; formatting rules based on Google rules, disables sorting includes as that breaks us, and enabling check on build. * Somehow this was missed in the mass format. Formatting include/distance.h. * Manually fixing the formatting because clang-format wouldn't, but WOULD flag it as invalid
2023-03-17 23:39:48 +03:00
block_convert(writr, readr, read_buf, write_buf, cblk_size, ndims);
std::cout << "Block #" << i << " written" << std::endl;
}
delete[] read_buf;
delete[] write_buf;
writr.close();
readr.close();
2020-09-05 10:25:44 +03:00
}